Recombinantly produced spider silk

ABSTRACT

The invention relates to novel spider silk protein analogs derived from the amino acid consensus sequence of repeating units found in the natural spider dragline of  Nephila clavipes . More specifically, synthetic spider dragline has been produced from  E. coli  and  Bacillus subtilis  recombinant expression systems wherein expressions from  E. coli  is at levels greater than 1 mg full-length polypeptide per gram of cell mass.

This application is filed under 35 USC 371 as the national stage ofInternational Application PCT/US94/06689 Jun. 15, 1994, acontinuation-in-part of U.S. application Ser. No. 08/077,600, filed Jun.15, 1993, now abandoned, and claims priority under 35 USC 120 as acontinuation-in-part to U.S. application Ser. No. 08/077,600, filed Jun.15, 1993, now abandoned.

FIELD OF THE INVENTION

The invention relates to novel spider silk protein analogs derived fromthe amino acid consensus sequence of repeating units found in thenatural spider dragline of Nephila clavipes. More specifically,synthetic spider dragline has been produced from E. coli and Bacillussubtilis recombinant expression systems wherein expression from E. coliis at levels greater than 1 mg full-length polypeptide per gram of cellmass.

BACKGROUND

Ever increasing demands for materials and fabrics that are bothlight-weight and flexible without compromising strength and durabilityhas created a need for new fibers possessing higher tolerances for suchproperties as elasticity, denier, tensile strength and modulus. Thesearch for a better fiber has led to the investigation of fibersproduced in nature, some of which possess remarkable qualities. Thevirtues of natural silk produced by Bombyx mori (silk worm) have beenwell known for years but it is only recently that other other naturallyproduced silks have been examined.

Spider silks have been demonstrated to have several desirablecharacteristics. The orb-web-spinning spiders can produce silk from sixdifferent types of glands. Each of the six fibers has differentmechanical properties. However, they all have several features incommon. They are (i) composed predominantly or completely of protein;(ii) undergo a transition from a soluble to an insoluble form that isvirtually irreversible; (iii) composed of amino acids dominated byalanine, serine, and glycine and have substantial quantities of otheramino acids, such as glutamine, tyrosine, leucine, and valine. Thespider dragline silk fiber has been proposed to consist ofpseudocrystaline regions of antiparallel, β-sheet structure interspersedwith elastic amorphous segments.

The spider silks range from those displaying a tensile strength greaterthan steel (7.8 vs 3.4 G/denier) and those with an elasticity greaterthan wool, to others characterized by energy-to-break limits that aregreater than KEVLAR® (1×10⁵ vs 3×10⁴ JKG-1). Given these characteristicsspider silk could be used as a light-weight, high strength fiber forvarious textile applications.

Considerable difficulty has been encountered in attempting to solubilizeand purify natural spider silk while retaining the molecular-weightintegrity of the fiber. The silk fibers are insoluble except in veryharsh agents such as LiSCN, LiClO4, or 88% (vol/vol) formic acid. Oncedissolved, the protein precipitates if dialyzed or if diluted withtypical buffers. Another disadvantage of spider silk protein is thatonly small amounts are available from cultivated spiders, makingcommercially useful quantities of silk protein unattainable at areasonable cost. Additionally, multiple forms of spider silks areproduced simultaneously by any given spider. The resulting mixture hasless application than a single isolated silk because the differentspider-silk proteins have different properties and, due tosolubilization problems, are not easily separated by methods based ontheir physical characteristics. Hence the prospect of producingcommercial quantities of spider silk from natural sources is not apractical one and there remains a need for an alternate mode ofproduction. The technology of recombinant genetics provides one suchmode.

By the use of recombinant DNA technology it is now possible to transferDNA between different organisms for the purposes of expressing desiredproteins in commercially useful quantities. Such transfer usuallyinvolves joining appropriate fragments of DNA to a vector molecule,which is then introduced into a recipient organism by transformation.Transformants are selected by a known marker on the vector, or by agenetic or biochemical screen to identify the cloned fragment. Vectorscontain sequences that enable autonomous replication within the hostcell, or allow integration into a chromosome in the host.

If the cloned DNA sequence encodes a protein, a series of events mustoccur to obtain synthesis of this foreign protein in an active form inthe host cell. Promoter sequences must be present to allow transcriptionof the gene by RNA polymerase, and a ribosome binding site andinitiation codon must be present in the transcribed mRNA for translationby ribosomes. These transcriptional and translational recognitionsequences are usually optimized for effective binding by the host RNApolymerase and ribosomes, and by the judicious choice of vectors, it isoften possible to obtain effective expression of many foreign genes in ahost cell.

While many of the problems of efficient transcription and translationhave been generally recognized and for the most part, overcome, thesynthesis of fiber-forming foreign polypeptides containing high numbersof repeating units poses unique problems. Genes encoding proteins ofthis type are prone to genetic instability due to the repeating nucleicacid sequences. Ideally, they encode proteins of high molecular weight,consisting of at least 800 amino acid residues, and generally withrestricted amino acid compositions. While E. coli produces endogenousproteins in excess of 1000 residues, production of long proteins ofrestricted amino acid composition appears to place an unbalanced strainon the biosynthetic system, resulting in the production of truncatedproducts, probably due to abortive translation.

In spite of the above mentioned difficulties, recombinant expression offiber forming proteins is known in the art. Chatellard et al., Gene, 81,267, (1989) teach the cloning and expression of the trimeric fiberprotein of human adenovirus type 2 from E. coli. The gene expressionsystem relied upon bacteriophage T7 RNA polymerase and optimal geneexpression was obtained at 30° C. where the foreign protein attainedlevels of 1% of total host protein.

Goldberg et al., Gene, 80, 305, (1989) disclose the cloning andexpression in E. coli of a synthetic gene encoding a collagen analog(poly (Gly-Pro-Pro)). The largest DNA insert was on the order of 450base pairs and it was suggested that large segments of highly-repeatedDNA may be unstable in E. coli.

Ferrari et al. (WO 8803533) disclose methods and compositions for theproduction of polypeptides having repetitive oligomeric units such asthose found in silk-like proteins and elastin-like proteins by theexpression of synthetic structural genes. The DNA sequences of Ferrariencode peptides containing an oligopeptide repeating unit which containsat least 3 different amino acids and a total of 4-30 amino acids, therebeing at least 2 repeating units in the peptide and at least 2 identicalamino acids in each repeating unit.

Cappello et al. (WO 9005177) teach the production of a proteinaceouspolymer from transformed prokaryotic hosts comprising strands ofrepeating units which can be assembled into aligned strands and DNAsequences encoding the same. The repeating units are derived fromnatural polymers such as fibroin, elastin, keratin or collagen.

The cloning and expression of silk-like proteins is also known. Ohshimaet al., Proc. Natl. Acad. Sci. U.S.A., 74, 5363, (1977) reported thecloning of the silk fibroin gene complete with flanking sequences ofBombyx mori into E. coli. Petty-Saphon et al. (EP 230702) disclose therecombinant production of silk fibroin and silk sericin from a varietyof hosts including E. coli, Saccharomyces cerevisiae, Pseudomonas spRhodopseudomonas sp, Bacilus sp, and Streptomyces sp. In the preferredembodiments the expression of silk proteins derived from Bombyx mori isdiscussed.

Progress has also been made in the the cloning and expression of spidersilk proteins. Xu et al., Proc. Natl, Acad. Sci. U.S.A., 87, 7120,(1990) report the determination of the sequence for a portion of therepetitive sequence of a dragline silk protein, Spidroin 1, from thespider Nephila clavipes, based on a partial cDNA clone. The repeatingunit is a maximum of 34 amino acids long and is not rigidly conserved.The repeat unit is composed of two different segments: (i) a 10 aminoacid segment dominated by a polyalanine sequence of 5-7 residues; (ii) a24 amino acid segment that is conserved in sequence but has deletions ofmultiples of 3 amino acids in many of the repeats. The latter sequenceconsists predominantly of GlyXaaGly motifs, with Xaa being alanine,tyrosine, leucine, or glutamine. The codon usage for this DNA is highlyselective, avoiding the use of cytosine or guanine in the thirdposition.

Hinman and Lewis, J. Biol. Chem. 267, 19320 (1992) report the sequenceof a partial cDNA clone encoding a portion of the repeating sequence ofa second fibroin protein, Spidroin 2, from dragline silk of Nephilaclavipes. The repeating unit of Spidroin 2 is a maximum of 51 aminoacids long and is also not rigidly conserved. The frequency of codonusage of the Spidroin 2 cDNA is very similar to Spidroin 1.

Lewis et al. (EP 452925) disclose the expression of spider silk proteinsincluding protein fragments and variants, of Nephila clavipes fromtransformed E. coli. Two distinct proteins were independently identifiedand cloned and were distinguished as silk protein 1 ((Spidroin 1) andsilk protein 2 (Spidroin 2).

Lombardi et al. (WO 9116351) teach the production of recombinant spidersilk protein comprising an amorphous domain or subunit and a crystallinedomain or subunit where the domain or subunit refers to a portion of theprotein containing a repeating amino acid sequence that provides aparticular mechanostructural property.

The above mentioned expression systems are useful for the production ofrecombinant silks and silk variants, however all rely on the specificcloned gene of a silk producing organism. One detrimental effect of suchsystems is that codon usage is not optimized for the production offoreign proteins in a recombinant host. It is well known in the art thatexpression of a foreign gene is more efficient if codons not favored bythe organism in which expression is desired are avoided. Foreign genescloned into recombinant hosts often rely on a codon usage not typicallyfound in the host. This often results in poor yields of foreign protein.

There remains a need therefore for a method to produce a spider silkprotein in commercially useful quantities. It is the object of thepresent invention to meet such need by providing novel DNA sequencesencoding variants of consensus sequences derived from spider silkproteins capable of being expressed in a foreign host having the abilityto produce synthetic proteins in commercially useful amounts of 1% to30% of total host protein.

SUMMARY OF THE INVENTION

The present invention provides novel synthetic spider dragline variantproteins produced by a process comprising the steps of:designing a DNAmonomer sequence of between about 50 bp and 1000 bp which codes for anpolypeptide monomer consisting of a variant of a consensus sequencederived from the fiber forming regions of spider dragline protein;assembling the DNA monomer; polymerizing the DNA monomer to form asynthetic gene encoding a full length silk variant protein; transforminga suitable host cell with a vector containing the synthetic gene;expressing the DNA polymer whereby the protein encoded by the DNApolymer is produced at levels greater than 1 mg full-length protein pergram of cell mass and; recovering the protein in a useful form.

The present invention provides novel plasmids containing DNAcompositions encoding spider silk variant proteins and novel transformedhost cells containing these plasmids which are capable of expressing thesilk variant protein at levels greater than 1 mg full-length polypeptideper gram of cell mass.

Also included in the scope of the invention are transformed host cellscapable of secreting full-length spider dragline protein analogs intothe cell growth medium.

In a preferred embodiment, an artificial gene is constructed to encodean analog of a spider silk protein, one of the proteins of the draglinefiber of Nephila clavipes. Means are provided whereby such an artificialgene can be assembled and polymerized to encode a protein ofapproximately the same length as the natural protein. Further, means areprovided whereby such an artificial gene can be expressed in a regulatedfashion in a bacterial host, producing large quantities of its proteinproduct. This protein product can be prepared in purified form suitablefor forming into a fiber. While the subject of the current invention isa spider silk variant protein, it should be understood that theinvention can be extended to encompass other highly repetitive fiberforming proteins or variant forms of such natural proteins.

The present invention provides methods for the production ofcommercially useful quantities of spider silk proteins inmicroorganisms, using recombinant DNA technology. Microbial methods ofproduction of such proteins, would provide several advantages. Forexample microbial sources would provide the basis for production offiber-forming proteins in large quantities at low enough cost forcommercial applications. Microbial hosts would allow the application ofrecombinant DNA technology for the construction and production ofvariant forms of fiber-forming proteins, as well as novel proteins thatcould extend the utility of such fibers. Furthermore, microbialproduction would permit the rapid preparation of samples of variantproteins for testing. Such proteins would be free of other proteinsfound in the natural fiber, allowing the properties of the individualproteins to be studied separately.

BRIEF DESCRIPTION OF THE DRAWINGS, SEQUENCE LISTING AND BIOLOGICALDEPOSITS

FIG. 1 illustrates the amino acid sequence (SEQ ID NO.:19) of naturalspider dragline protein Spidroin 1 as disclosed by Xu et al., Proc.Natl, Acad. Sci. U.S.A., 87, 7120, (1990).

FIG. 2A illustrates the amino acid sequence (SEQ ID NO:20) of themonomer of the spider silk DP-1A.9 analogue (SEQ ID NO:80).

FIG. 2B illustrates the amino acid sequence (SEQ ID NO:21) of thepolymer of the spider silk DP-1A.9 analogue (SEQ ID NO:80).

FIG. 3A illustrates the amino acid sequence (SEQ ID NO:22) of themonomer of the spider silk DP-1B.9 analogue (SEQ ID NO:81).

FIG. 3B illustrates the amino acid sequence (SEQ ID NO:23) of thepolymer of the spider silk DP-1B.9 analogue (SEQ ID NO:81).

FIG. 4A illustrates the synthetic oligonucleotide L (SEQ ID Nos. 24-26)used in the construction of the DNA monomer for DP-1 protein expression.

FIG. 4B illustrates the synthetic oligonucleotide M1 (SEQ ID Nos. 27-29)used in the construction of the DNA monomer for DP-1 protein expression.

FIG. 4C illustrates the synthetic oligonucleotide M2 (SEQ ID Nos. 30-32)used in the construction of the DNA monomer for DP-1 protein expression.

FIG. 4D illustrates the synthetic oligonucleotide S (SEQ ID Nos. 33-35)used in the construction of the DNA monomer for DP-1 protein expression.

FIG. 5 is a plasmid map illustrating the construction of plasmid pFP510from pA126i. Plasmid pFP510 is used to construct plasmids for theassembly and polymerization of DNA monomers and genes encoding DP-1Aanalogs.

FIG. 6 is a plasmid map of plasmid pFP202 which is used to constructhigh level expression vectors.

FIG. 7A illustrates the double stranded synthetic oligonucleotide A (SEQID Nos. 41-43) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 7B illustrates the double stranded synthetic oligonucleotide B (SEQID Nos. 44-46) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 7C illustrates the double stranded synthetic oligonucleotide C (SEQID Nos. 47-49) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 7D illustrates the double stranded synthetic oligonucleotide D (SEQID Nos. 50-52) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 7E illustrates the double stranded synthetic oligonucleotide E (SEQID Nos. 53-55) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 7F illustrates the double stranded synthetic oligonucleotide F (SEQID Nos. 56-58) used in the construction of the DNA monomer for DP-2protein expression.

FIG. 8 illustrates the amino acid sequence (SEQ ID NO.:59) of thenatural spider silk protein Spidroin 2 as described by Lewis et al. (EP452925).

FIG. 9A illustrates the amino acid sequence of the amino acid monomer(SEQ ID NO:60)of the spider dragline protein 2 analog DP-2A (SEQ IDNO.:83).

FIG. 9B illustrates the amino acid sequence of the amino acid polymer(SEQ ID NO:61)of the spider dragline protein 1 analog DP-2A (SEQ IDNO.:83).

FIG. 10A illustrates the amino acid sequence of the amino acid monomer(SEQ ID NO:62)of the spider dragline protein 1 analog DP-1B.16 (SEQ IDNO.:82).

FIG 10B illustrates the amino acid sequence of the amino acid polymer(SEQ ID NO:63)of the spider dragline protein 1 analog DP-1B.16 (SEQ IDNO.:82).

FIG. 11A illustrates the double stranded synthetic oligonucleotide 1(SEQ ID Nos. 64-66) used to construct the synthetic genes encodingDP-1B.16 (SEQ ID NO:82).

FIG. 11B illustrates the double stranded synthetic oligonucleotide 2(SEQ ID Nos. 67-69) used to construct the synthetic genes encodingDP-1B.16 (SEQ ID NO:82).

FIG. 11C illustrates the double stranded synthetic oligonucleotide 3(SEQ ID Nos. 70-72) used to construct the synthetic genes encodingDP-1B.16 (SEQ ID NO:82).

FIG. 11D illustrates the double stranded synthetic oligonucleotide 4(SEQ ID Nos. 63-75) used to construct the synthetic genes encodingDP-1B.16 (SEQ ID NO:82).

FIG. 12 is a plasmid map illustrating the construction of the plasmidpFP206 from pA126i. Plasmid pFP206 was used to construct plasmids usedfor the assembly and polymerization of the DNA monomer, and genesencoding DP-1B analogs.

FIG. 13A is a plasmid map of plasmid pA126i. FIG. 13B illustrates thefull sequence of plasmid pA126i (SEQ ID NO:78).

FIG. 13C is a continuation from FIG. 13B of the full sequence of plasmidpA126i (SEQ ID NO:78).

FIG. 13D is a continuation from 13C of the full sequence of plasmidpA126i (SEQ ID NO:78).

FIG. 14A is a plasmid map of pBE346.

FIG. 14B illustrates the complete DNA sequence (SEQ ID NO:79) of theplasmid pBE346.

FIG. 14C is a continuation from FIG. 14B of the complete DNA sequence(SEQ ID NO:79) of the plasmid pBE346.

FIG. 14D is a continuation from FIG. 14C of the complete DNA sequence(SEQ ID NO:79) of the plasmid pBE346.

FIG. 14E is a continuation from FIG. 14D of the complete DNA sequence(SEQ ID NO:79) of the plasmid pBE346.

FIG. 14F is a continuation from FIG. 14E of the complete DNA sequence(SEQ ID NO:79) of the plasmid pBE346.

FIG. 15A illustrates the construction of plasmid pFP169b from plasmidpFP541.

FIG. 15B illustrates the construction of plasmid pFP191 from pBE346.

FIG. 16A illustrates the synthetic double stranded oligonucleotide P1(SEQ ID Nos:84-86) used to construct the synthetic genes encodingDP-1B.33.

FIG. 16B illustrates the synthetic double stranded oligonucleotide P2(SEQ ID Nos:87-89) used to construct the synthetic genes encodingDP-1B.33.

FIG. 16C illustrates the synthetic double stranded oligonucleotide P3(SEQ ID Nos:90-92) used to construct the synthetic genes encodingDP-1B.33.

FIG. 16D illustrates the synthetic double stranded oligonucleotide P4(SEQ ID Nos:93-95) used to construct the synthetic genes encodingDP-1B.33.

FIG. 17 is a plasmid map of plasmid pHIL-D4, used to construct vectorsfor intracellular protein expression in Pichia pastoris.

FIG. 18 is a plasmid map of plasmid pPIC9, used to construct vectors forextracellular protein production in P. pastoris.

FIG. 19 illustrates the DNA sequence of a portion of plasmid pFO734, anintermediate in the construction of vectors for extracellular proteinproduction in P. pastoris.

FIG. 20 illustrates DP-1B production by P. pastoris strain YFP5028.

FIG. 21 illustrates DP-1B production by P. pastoris strain YFP5093.

Applicants have provided sequence listings 1-107 in conformity with“Rules for the standard representation of nucleotide and amino acidsequence in patent applications” (Annexes I and II to the Decision ofthe President of the EPO, published in Supplement No. 2 to OJ EPO12/1992).

Applicants have made the following biological deposits under the termsof the Budapest Treaty.

Deposit or Identification Reference ATCC Designation Deposit DateEscherichia coli, FP 3227 69326 15 June 1993 Escherichia coli, FP 219369327 15 June 1993 Escherichia coli, FP 3350 69328 15 June 1993

As used herein, the designation “ATCC” refers to the American TypeCulture Collection depository located in Manassas, Va. at 10801University Blvd., Manassas, Va. 20110-2209, U.S.A. The “ATCC No.” is theaccession number to cultures on deposit at the ATCC.

DETAILED DESCRIPTION OF THE INVENTION

The following definitions are used herein and should be referred to forinterpretation of the claims and the specification.

As used herein, the terms “promoter” and “promoter region” refer to asequence of DNA, usually upstream of (5′ to) the protein coding sequenceof a structural gene, which controls the expression of the coding regionby providing the recognition for RNA polymerase and/or other factorsrequired for transcription to start at the correct site. Promotersequences are necessary but not always sufficient to drive theexpression of the gene.

A “fragment” constitutes a fraction of the DNA sequence of theparticular region.

“Nucleic acid” refers to a molecule which can be single stranded ordouble stranded, composed of monomers (nucleotides) containing a sugar,phosphate and either a purine or pyrimidine. In bacteria, lowereukaryotes, and in higher animals and plants, “deoxyribonucleic acid”(DNA) refers to the genetic material while “ribonucleic acid” (RNA) isinvolved in the translation of the information from DNA into proteins.

The terms “peptide”, “polypeptide” and “protein” are usedinterchangeably.

“Regulation” and “regulate” refer to the modulation of gene expressioncontrolled by DNA sequence elements located primarily, but notexclusively upstream of (5′ to) the transcription start of a gene.Regulation may result in an all or none response to a stimulation, or itmay result in variations in the level of gene expression.

The term “coding sequence” refers to that portion of a gene encoding aprotein, polypeptide, or a portion thereof, and excluding the regulatorysequences which drive the initiation of transcription. The codingsequence may constitute an uninterrupted coding region or it may includeone or more introns bounded by appropriate splice junctions. The codingsequence may be a composite of segments derived from different sources,naturally occurring or synthetic.

The term “construction” or “construct” refers to a plasmid, virus,autonomously replicating sequence, phage or nucleotide sequence, linearor circular, of a single- or double-stranded DNA or RNA, derived fromany source, in which a number of nucleotide sequences have been joinedor recombined into a unique construction which is capable of introducinga promoter fragment and DNA sequence for a selected gene product alongwith appropriate 3′ untranslated sequence into a cell.

As used herein, “transformation” is the acquisition of new genes in acell by the incorporation of nucleic acid.

The term, “operably linked” refers to the chemical fusion of twofragments of DNA in a proper orientation and reading frame to lead tothe transcription of functional RNA.

The term “expression” as used herein is intended to mean thetranscription and translation to gene product from a gene coding for thesequence of the gene product. In the expression, a DNA chain coding forthe sequence of gene product is first transcribed to a complementary RNAwhich is often a messenger RNA and, then, the thus transcribed messengerRNA is translated into the above-mentioned gene product if the geneproduct is a protein.

The term “translation initiation signal” refers to a unit of threenucleotides (codon) in a nucleic acid that specifies the initiation ofprotein synthesis.

The term “signal peptide” refers to an amino terminal polypeptidepreceding the secreted mature protein. The signal peptide is cleavedfrom and is therefore not present in the mature protein. Signal peptideshave the function of directing and trans-locating secreted proteinsacross cell membranes. The signal peptide is also referred to as signalsequence.

The term “mature protein” refers to the final secreted protein productwithout any part of the signal peptide attached.

The term “plasmid” or “vector” as used herein refers to anextra-chromosomal element often carrying genes which are not part of thecentral metabolism of the cell, and usually in the form of circulardouble-stranded DNA molecules.

The term “restriction endonuclease” refers to an enzyme which catalyzeshydrolytic cleavage within a specific nucleotide sequence indouble-stranded DNA.

The term “compatible restriction sites” refers to different restrictionsites that when cleaved yield nucleotide ends that can be ligatedwithout any additional modification.

The-term “suitable promoter” will refer to any eukaryotic or prokaryoticpromoter capable of driving the expression of a synthetic spider silkvariant gene.

The term “spider silk variant protein” will refer to a designed protein,the amino acid sequence of which is based on repetitive sequence motifsand variations thereof that are found in a known a natural spider silk.

The term “full length variant protein” will refer to any spider silkvariant protein encoded by a synthetic gene which has been constructedby the assembly and polymerization of a DNA monomer.

The term “DNA monomer” will refer to a DNA fragment consisting ofbetween 300 and 400 bp which encodes one or more repeating amino acidsequences of a spider silk variant protein. Examples of DNA monomerssuitable for the present invention are illustrated in FIGS. 2, 3, 9 and10.

The term “peptide monomer”, “polypeptide monomer” or “amino acidmonomer” will refer to the amino acid sequence encoded by a DNA monomer.

The term “commercial quantities” will refer to quantities ofrecombinantly produced desired proteins where at least 1% of the totalprotein produced by a microbial culture is the desired protein.

The term “desired protein” will refer to any protein considered avaluable product to be obtained from genetically engineered bacteria.

The term “DP-1 analog” will refer to any spider silk variant derivedfrom the amino acid sequence of the natural Protein 1 (Spidroin 1) ofNephila calvipes as illustrated in FIG. 1.

The term “DP-2 analog” will refer to any spider silk variant derivedfrom the amino acid sequence of the natural Protein 2 (Spidroin 2) ofNephila calvipes as illustrated in FIG. 8.

As used herein the following abbreviations will be used to identifyspecific amino acids:

Three-Letter One-Letter Amino Acid Abbreviation Abbreviation Alanine AlaA Arginine Arg R Asparagine Asn N Aspartic acid Asp D Asparagine oraspartic acid Asx B Cysteine Cys C Glutamine Gln Q Glutamine acid Glu EGlutamine or glutamic acid Glx Z Glycine Gly G Histidine His H LeucineLeu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro PSerine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine ValV

The present invention also provides novel DNA sequences encoding spidersilk protein variants that are suitable for expression of commercialquantities of silk protein in a recombinant host.

It will be appreciated that the advantages of such a protein and such amethod are many. Spider silk, especially dragline silk, has a tensilestrength of over, 200 ksi with an elasticity of nearly 35%, which makesit more difficult to break than either KEVLAR or steel. When spun intofibers, spider silk of the present invention may have application in thebulk clothing industries as well as being applicable for certain kindsof high strength uses such as rope, surgical sutures, flexible tie downsfor certain electrical components and even as a biomaterial forimplantation (e.g., artificial ligaments or aortic banding).Additionally these fibers may be mixed with various plastics and/orresins to prepare a fiber-reinforced plastic and/or resin product.Furthermore, since spider silk is stable up to 100° C., these fibers maybe used to reinforce thermal injected plastics. These proteins may alsobe of value in the form of films or coatings. It will be appreciated byone of skill in the art that the properties of the silk fibers may bealtered by altering the amino acid sequence of the protein.

The present invention provides a method for the production of analogs ofnatural spider silk proteins and variants using recombinant DNAtechnology. The method consists of (1) the design of analog proteinsequences based on the amino acid sequence of the fiber forming regionsof natural proteins; (2) the design of DNA sequences to encode suchanalog protein sequences, based on a DNA monomer of at least 50 bp withminimal internal repetitiveness, and making preferential use of codonsmatched to the preferences of a specific host organism; (3) assembly ofthe DNA monomer from cloned synthetic oligonucleotides; (4)polymerization of the DNA monomer to lengths of at least 800 bp, andpreferably to lengths approximating the length of the gene encoding thenatural protein; (5) inserting the polymerized artificial gene into anappropriate vector able to replicate in the host organism, in such amanner. that the gene is operably linked to expression signals wherebyits expression can be regulated; (6) producing the protein in the abovementioned microbial host carrying such an expression vector; (7)purifying the protein from the biomass and preparing it in a formsuitable for forming into fibers, films, or coatings.

The expression of the desired silk variant protein in Escherichia coliis preferred since this host reliably produces high levels of foreignprotein and the art is replete with suitable transformation andexpression vectors. However, it is not outside the scope of theinvention to provide alternative hosts and particularly hosts thatfacilitate the secretion of the desired protein into the growth medium.Such alternative hosts may include but are not limited to Bacillussubtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Pichiapastoris, Aspergillus spp., Hansenula spp., and Streptomyces spp. Theexpression host preferred for the secretion of silk variant protein isBacillus subtilis.

The present invention provides a variety of plasmids or vectors suitablefor the cloning of portions of the DNA required for the assembly andexpression of the silk variant protein gene in E. coli. Suitable vectorsfor construction contain a selectable marker and sequences allowingautonomous replication or chromosomal integration. Additionally,suitable vectors for expression contain sequences directingtranscription and translation of the heterologous DNA fragment. Thesevectors comprise a region 5′ of the heterologous DNA fragment whichharbors transcriptional initiation controls, and optionally a region 3′of the DNA fragment which controls transcriptional termination. It ismost preferred when both control regions are derived from geneshomologous to E. coli although it is to be understood that such controlregions need not be derived from the genes native to the specificspecies chosen as a production host. Suitable vectors can be derived,for example, from a bacteria, a virus (such as bacteriophage T7 or aM-13 derived phage), a cosmid, a yeast or a plant. Protocols forobtaining and using such vectors are known to those in the art.(Sambrook et al., Molecular Cloning: A Laboratory Manual—volumes 1,2,3(Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y., 1989))

Examples of bacteria-derived vectors include plasmid vectors such aspBR322, pUC19, pSP64, pUR278 and pORF1. Illustrative of suitable viralvectors are those derived from phage, vaccinia, retrovirus, baculovirus,or a bovine papilloma virus. Examples of phage vectors include λ⁺,λEMBL3, 12001, λgt10, λgt11, Charon 4a, Charon 40, and λZAP/R. pXB3 andpSC11 are exemplary of vaccinia vectors (Chakrabarti et al., Molec.Cell. Biol. 5:3401-9 (1985) and Mackett et al., J. Virol. 49:857864(1984). An example of a filamentous phage vector is an M13-derivedvector like M13mp18, and M13mp19.

For the expression of spider silk variant proteins in E. colibacteria-derived vectors are preferred where plasmids derived frompBR322 are most preferred.

Optionally it may be desired to produce the silk variant protein as asecretion product of a transformed host, such as B. subtilis. Secretionof desired proteins into the growth media has the advantage ofsimplified and less costly purification procedures. It is well known inthe art that secretion signal sequences are often useful in facilitatingthe active transport of expressible proteins across cell membranes. Thecreation of a transformed Bacillus host capable of secretion may beaccomplished by the incorporation of a DNA sequence that codes for asecretion signal functional in the Bacillus production host on theexpression cassette, between the expression-controlling DNA and the DNAencoding the silk variant protein and in reading frame with the latter.Examples of vectors enabling the secretion of a number of differentheterologous proteins by B. subtilis have been taught and are describedin Nagarajan et al., U.S. Pat. No. 4,801,537; Stephens et al., U.S. Pat.No. 4,769,327; and Biotechnology Handbook 2, Bacillus, C. R. Harwood,Ed., Plenum Press, New York (1989).

Secretion vectors of this invention include a regulatable promotersequence which controls transcription, a sequence for a ribosome bindingsite which controls translation, and a sequence for a signal peptidewhich enables translocation of the peptide through the bacterialmembrane and the cleavage of the signal peptide from the mature protein.Suitable vectors will be those which are compatible with the bacteriumemployed. For example, for B. subtilis such suitable vectors include E.coli-B. subtilis shuttle vectors. They will have compatible regulatorysequences and origins of replication. They will be preferably multicopyand have a selective marker gene, for example, a gene coding forantibiotic resistance. An example of such a vector is pTZ18R phagemid,obtainable from Pharmacia, Piscataway, N.J. 08854 which confersresistance to ampicillin in E. coli. The DNA sequences encoding thepromoter, ribosome binding site and signal peptide may be from anysingle gene which encodes a secreted product.

The DNA sequences encoding the promoter and ribosome binding site mayalso be from a different gene than that encoding the signal peptide. TheDNA sequences encoding the promoter, ribosome binding site and signalpeptide can be isolated by means well known to those in the art andillustrative examples are documented in the literature. SeeBiotechnology Handbook 2 Bacillus, C. R. Harwood, Ed., Plenum Press, NewYork, N.Y. (1989). The promoters in the DNA sequences may be eitherconstitutive or inducible and thus permit the resulting secretionvectors to be differentially regulated.

Promoters which are useful to drive expression of heterologous DNAfragments in E. coli and Bacillus are numerous and familiar to thoseskilled in the art. Virtually any promoter capable of driving the geneencoding a silk variant protein is suitable for the present invention,where the T7 promoters are preferred in E. coli and promoters derivedfrom the SacB gene are preferred in Bacillus.

Termination control regions may also be derived from various genesnative to E. coli or Bacillus hosts, or optionally other bacterialhosts. It will be appreciated by one of skill in the art that atermination control region may be unnecessary.

For introducing a polynucleotide of the present invention into abacterial cell, known procedures can be used according to the presentinvention such as by transformation, e.g., using calcium-permeabilizedcells, electroporation, or by transfection using a recombinant phagevirus. (Sambrook et al., Molecular Cloning: A Laboratory Manual—volumes1,2,3 (Cold Spring Harbor Laboratory: Cold Spring Harbor, N.Y., 1989)).Other known procedures can also be employed to obtain a recombinant hostcell that expresses a heterologous spider silk protein according to thepresent invention, as will be apparent to those skilled in the art.

Design of Spider Silk Variant Amino Acid Sequences:

The design of the spider silk variant proteins was based on consensusamino acid sequences derived from the fiber forming regions of thenatural spider silk dragline proteins of Nephila clavipes. Naturalspider dragline consists of two different proteins that are co-spun fromthe spider's major ampullate gland. The amino acid sequence of bothdragline proteins has been disclosed by Xu et al., Proc. Natl, Acad.Sci. U.S.A., 87, 7120, (1990) and Hinman and Lewis, J. Biol. Chem. 267,19320 (1992), and will be identified hereinafter as Dragline Protein 1(DP-1) and Dragline Protein 2 (DP-2).

The amino acid sequence of a fragment of DP-1 is repetitive and rich inglycine and alanine, but is otherwise unlike any previously known aminoacid sequence. The repetitive nature of the protein and the pattern ofvariation among the individual repeats are emphasized by rewriting thesequence as in FIG. 1. The “consensus” sequence of a single repeat,viewed in this way, is:

A GQG GYG GLG XQG A GRG GLG GQG A GAAAAAAAGG (SEQ ID NO:1)

where X may be S,G, or N.

Examination of FIG. 1 shows that individual repeats differ from theconsensus according to a pattern which can be generalized as follows:(1) The poly-alanine sequence varies in length from zero to sevenresidues. (2) When the entire poly-alanine sequence is deleted, so alsois the surrounding sequence encompassing AGRGGLGGQGAGA_(n)GG (SEQ IDNO:2). (3) Aside from the poly-alanine sequence, deletions generallyencompass integral multiples of three consecutive residues. (4) Deletionof GYG is generally accompanied by deletion of GRG in the same repeat.(5) A repeat in which the entire poly-alanine sequence is deleted isgenerally preceded by a repeat containing six alanine residues.

Synthetic analogs of DP-1 were designed to mimic both the repeatingconsensus sequence of the natural protein and the pattern of variationamong individual repeats. Two analogs of DP-1 were designed anddesignated DP-1A and DP-1B. DP-1A is composed of a tandemly repeated101-amino acid sequence listed in FIG. 2A. The 101-amino acid “monomer”SEQ ID NO:20 comprises four repeats which differ according to thepattern (1)-(5) above. This 101-amino acid long peptide monomer SEQ IDNO:20 is repeated from 1 to 16 times in a series of analog proteins.DP-1B was designed by reordering the four repeats within the monomer ofDP-1A. This monomer sequence, shown in FIG. 3A, exhibits all of theregularities of (1)-(5) above. In addition, it exhibits a regularity ofthe natural sequence which is not shared by DP-1A, namely that a repeatin which both GYG and GRG are deleted is generally preceded by a repeatlacking the entire poly-alanine sequence, with one intervening repeat.The sequence of DP-1B matches the natural sequence more closely over amore extended segment than does DP-1A.

The amino acid sequence of a fragment of DP-2 is also repetitive andalso rich in glycine and alanine, but is otherwise unlike any previouslyknown amino acid sequence, and, aside from a region of consecutivealanine residues, different from DP-1. The repetitive nature of theprotein and the pattern of variation among the individual repeats areemphasized by rewriting the sequence as in FIG. 8. The “consensus”sequence of a single repeat, viewed in this way, is:

[GPGGY GPGQQ]₃ GPSGPGS A₁₀ (SEQ ID NO:18)

Examination of FIG. 8 shows that individual repeats differ from theconsensus according to a pattern which can be generalized as follows:(1) The poly-alanine-rich sequence varies in length from six to tenresidues. (2) Aside from the poly-alanine sequence, individual repeatsdiffer from the consensus repeat sequence by deletions of integralmultiples of five consecutive residues consisting of one or both of thepentapeptide sequences GPGGY (SEQ ID NO:3) or GPGQQ (SEQ ID NO:4).

Synthetic analogs of DP-2 were designed to mimic both the repeatingconsensus sequence of the natural protein and the pattern of variationamong individual repeats. The analog DP-2A SEQ ID NO:61 is composed of atandemly repeated 119-amino acid sequence listed in FIG. 9A. The119-amino acid “peptide monomer” comprises three repeats which differaccording to the pattern (1)-(2) above. This 119-amino acid long peptidemonomer is repeated from 1 to 16 times in a series of analog proteins.

Design of DNA encoding Spider Silk Variant Proteins:

DNA sequences encoding the designed analog amino acid sequences weredevised according to the following criteria: (1) The DNA monomer was tobe at least 300 bp in length; (2) within the monomer, repetitiveness ofthe sequence was minimized, with no repeated sequence longer than 17 bpand minimal repetitiveness of sequences longer than 10 bp; (3) wherepossible, codons were chosen from among the codons found preferentiallyin highly expressed genes of the intended host organism (E. coli) withpreference for codons providing balanced A+T/G+C base ratios; and (4)predicted secondary structure of mRNA within the monomer was dominatedby long-range interactions rather than shorter range base pairing. Noattempt was made to minimize secondary structure of the mRNA.

Assembly of DP-1 and DP-2 Analog Genes:

Assembly of the synthetic dragline analog genes was accomplished byfirst assembling the appropriate DNA monomers followed by polymerizationof these monomers to form the completed gene.

Synthetic DNA monomers, based on the consensus peptide monomersdescribed above were assembled from four to six cloned double strandedsynthetic oligonucleotides. Each oligonucleotide was designed to encodea different portion of the the peptide monomer. Briefly, theoligonucleotides were each cloned into separate suitable plasmid vectorscontaining an ampicillin resistance gene. A suitable E. coli host wastransformed with the plasmids and screened for the presence of thecorrect vector by standard methods. After the oligonucleotides werecloned the DNA monomer was sequentially assembled. Vectors containingindividual oligonucleotides were digested and the plasmid DNA waspurified by gel electrophoresis. Purified plasmid DNA containing twodifferent oligonucleotide sequences were then incubated under ligatingconditions and the ligation products were used to transform a suitableE. coli host. These transformants comprised two of the oligonucleotidesequences linked in tandem. A similar procedure was followed for thecreation of the full DNA monomer, comprising four to six of theoligonucleotides. Additional confirmation of the existence of thecorrect DNA insertions was obtained by direct DNA sequencing. Thepresent invetion provides several DNA monomers useful for the productionof DP-1A and DP-1B analogs. In general DNA monomers used to produce thethe analog DP-1B.16 are preferred since this construct avoids codonsrarely used by the E. coli production host.

The assembled DNA monomer was then polymerized by a method essentiallyas described by Kempe et al. (Gene 39, 239, (1985). This method consistsof a series of successive doublings of the sequence of interest.Briefly, the DNA monomer containing the cloned oligonucleotides wasdigested with suitable restriction enzymes and incubated under annealingconditions followed by ligation to produce a series of constructscontaining multiple repeats of the monomer. Ligation products were usedto transform a suitable E. coli host and intact plasmids were selectedon the basis of ampicillin resistance. Subsequent analysis of plasmidDNA by gel electrophoresis resulted in the identification oftransformants containing plasmids with 2, 4, 8, and 16 tandem repeats ofthe DNA monomer. These protein products were analyzed by SDSpolyacrylamide gel electrophoresis and detected and quantitated byimmunochemical staining using a polyclonal antiserum raised in rabbitsagainst a synthetic peptide analogous to a fragment of the naturalprotein.

Expression and Purification of Protein:

High level expression of the spider dragline protein analogs in E. coliwas achieved by inserting the synthetic genes into plasmid vectorspFP202 and pFP204, which were derived from the well-known vector pET11a.In these vectors, the dragline protein-coding gene is inserted in such amanner as to be operably linked to a promoter derived from bacteriophageT7. This promoter is joined with sequences derived from the lac operatorof E. coli, which confers regulation by lactose or analogs (IPTG). TheE. coli host strain BL21(DE3) contains a lambda prophage which carries agene encoding bacteriophage T7 RNA polymerase. This gene is controlledby a promoter which is also regulated by lactose or analogs. In additionto the phage T7 promoter, the vectors pFP202 and pFP204 providesequences which encode a C-terminal tail containing six consecutivehistidine resdues appended to the dragline protein-coding sequences.This tail provides a means of affinity purification of the protein underdenaturing conditions through its adsorption to resins bearingimmobilized Ni ions.

DP-1 analog protein was produced by E. coli at levels of approximately5-20% of total protein. Of this, approximately 20-40% was recovered inpurified form as full-length protein. DP-2 analog protein was producedat approximately 5% of total cell protein, of which approximately 30%was recovered in purified form as full-length protein.

The following examples are meant to illustrate the invention but shouldnot be construed as limiting it in any way.

EXAMPLES GENERAL METHODS

The position of the newly engineered restriction sites is indicated inthe figures and any one skilled in the art can repeat these constructswith the available information.

The source of the genes and the various vectors described throughoutthis application are as follows.

The anti-DP-1 and anti-DP-2 antisera were prepared by Multiple PeptideSystems, San Diego, Calif.

Restriction enzyme digestions, phosphorylations, ligations,transformations and other suitable methods of genetic engineeringemployed herein are described in Sambrook et al., Molecular Cloning: ALaboratory Manual—volumes 1,2,3 (Cold Spring Harbor Laboratory: ColdSpring Harbor, N.Y., 1989), and in the instructions accompanyingcommercially available kits for genetic engineering.

Bacterial cultures and plasmids to carry out the present invention areavailable either commercially (from Novagen, Inc., Madison, Wis.) orfrom the E. coli Genetic Stock Center, Yale University, New Haven,Conn., the Bacillus Genetic Stock Center, Ohio State University,Columbus, Ohio, or the ATCC and, along with their sources, areidentified in the text and examples which follow. Unless otherwisespecified standard reagents and solutions used in the following exampleswere supplied by Sigma Chemical Co. (St. Louis, Mo.)

Isolation of restriction fragments from agarose gels used the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.), and wasperformed as specified by the manufacturer.

Example 1 CONSTRUCTION OF THE SYNTHETIC GENES DP-1A.9 AND DP-1B.9 SEQ IDNO:81

Oligonucleotide Design and Cloning:

Synthetic genes encoding DP-1A.9 SEQ ID NO:80 and DP-1B.9 SEQ ID NO:81were assembled from four double stranded synthetic oligonucleotideslabled L (SEQ ID NOs.:24, 25, and 26), M1 (SEQ ID NOs.:27, 28, and 29),M2 (SEQ ID NOs.:30, 31, and 37), and S (SEQ ID NOs.:33, 34, and 35)whose sequences are shown in FIGS. 4A-4D. The oligonucleotides wereprovided by the manufacturer (Midland Certified Reagents, Midland, Tex.)in double stranded form with 5′-OH groups phosphorylated. Methods ofoligonucleotide synthesis, purification, phosphorylation, and annealingto the double stranded form are well known to those skilled in the art.

The four double stranded oligonucleotides were separately cloned byinserting them into a plasmid vector pFP510 (FIG. 5). This vector wasderived from the plasmid pAl26i (see FIG. 13A), the complete nucleotidesequence of which is provided in SEQ ID NO.:78 and FIG. 13B. Details ofthe structure of pl126i are not important for the construction, asidefrom the following essential features: (a) a replication origin activein E. coli; (b) a selectable genetic marker, in this case a geneconferring resistance to the antibiotic ampicillin; (c) sites forrestriction endonucleases BamHI and BglII with no essential sequencesbetween them; and (d) a third restriction site (PstI), located withinthe selectable marker, which produces cohesive ends incompatible withthose produced by BamHI and BglII. For the construction of pFP510, DNAof plasmid pA126i was digested with endonucleases BamHI and BglII, thenrecovered by adsorption to glass beads in the presence of NaI GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). Toapproximately 0.1 pmole of the eluted plasmid DNA was added 10 pmoles ofthe double stranded, phosphorylated oligonucleotide SF4/5 (FIG. 5). Themixture was incubated under ligation conditions with T4 polynucleotideligase for 19 h at 4° C. Ligated DNA was then digested with endonucleaseXmaI to linearize any remaining parental pA126i and used to transform E.coli SK2267 (obtained from the E. coli Genetic Stock Center, YaleUniversity, New Haven, Conn.) which had been made competent by calciumtreatment as described by Sambrook et al., op. cit. Plasmid DNA isolatedfrom ampicillin resistant transformants was characterized by digestionseparately with endonucleases ApaI and BamHI, and a transformantcontaining the desired plasmid was identified and designated pFP510.

DNA of plasmid pFP510 was digested with endo-nucleases SfiI and DraIIIand purified by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284,La Jolla, Calif.). To approximately 0.1 pmole of the eluted plasmid DNAwas added 10 pmoles of one of the double stranded, phosphorylatedoligonucleotides L, M1, M2, or S (FIG. 4). The fourplasmid-oligonucleotide mixtures were incubated under ligationconditions for 15 h at 4° C., then for 20 min at 23° C. and finallyligation was terminated by incubation for 3 min at 65° C. Aliquots ofligated DNA were used to transform E. coli SK2267 and ampicillinresistant transformants were selected. Clones containingoligonucleotides L, M1, and M2 shown in FIG. 4 were identified byscreening plasmid DNA isolated from individual transformants withendonuclease AlwNI, a recognition site for which is present in theoligonucleotides. Clones containing oligonucleotide S were identified byscreening plasmid DNA isolated from individual transformants withendonucleases BglI and DraIII. Plasmid DNA from putative clones wasfurther characterized by digestion with endonucleases EcoRI, SfiI, andDraIII in order to establish that the oligonucleotide sequences wereoriented correctly in the plasmid. The inserts were excised withendonucleases BamHI and BglII and analyzed by electrophoresis in 4%NuSieve agarose (FMC) to verify that the plasmid had acquired only asingle copy of the oligonucleotide. Correct clones were identified andtheir plasmids were designated pFP521 (oligonucleotide L), pFP533(oligonucleotide M1), pFP523 (oligonucleotide M2), and pFP524(oligonucleotide S). DNA sequences of all four cloned oligonucleotideswere verified by DNA sequencing.

DNA sequencing was carried out essentially according to proceduresprovided by the supplier (U.S. Biochemicals) with the Sequenase 2.0 kitfor DNA sequencing with 7-deaza-GTP. Plasmid DNA was prepared using theMagic Minipreps kit (Promega). Template DNA was denatured by incubating20 μl miniprep DNA in 40 μl (total volume) 0.2 M NaOH for 5 min at 23°C. The mixture was neutralized by adding 6 μl 2 M ammonium acetate(adjusted to pH 4.5 with acetic acid), and the DNA was precipitated byadding 0.15 mL ethanol, recovered by centrifugation, washed with cold70% ethanol, and vacuum dried. Primers for sequencing were as follows:

SI1: 5′-ACGACCTCATCTAT (SEQ ID NO:5)

SI5: 5′-CTGCCTCTGTCATC (SEQ ID NO:6)

SI20: 5′-AATAGGCGTATCAC (SEQ ID NO:7)

Primers SI1 and SI5 anneal to sites on opposite strands in pA126i. SI5primes synthesis into the sequences of interest from 31 bp beyond theBamHI site. SI1 primes synthesis on the opposite strand into thesequences of interest from 38 bp beyond the BglII site. For sequencingin the vector pFP206 (see below) the primer SI20, which anneals 25 bpbeyond the BglII site, was substituted for SI1 (FIG. 12). Polyacrylamidegels for DNA sequencing were run at 52° C.

Assembly of the Gene:

For assembly of subsequence M2L, plasmid pFP523 (M2) was digested withendonucleases PstI and DraIII, and plasmid pFP521 (L) was digested withendonucleases PstI and SfiI. Digested plasmid DNA was fractionated byelectrophoresis in a 1.2% agarose (low melting, BioRad) gel. Ethidiumbromide-stained bands containing the oligonucleotide sequences,identified by their relative sizes, were excised, the excised bandscombined, and the DNA recovered from melted agarose by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). The elutedcombined DNA fragments were incubated under ligation conditions and analiquot was used to transform E. coli W3110 (available from the E. coliGenetic Stock Center, Yale University, New Haven, Conn.). Ampicillinresistant transformants were selected. Plasmid DNA was isolated fromseveral transformants, digested with endonucleases BamHI and BglII, andanalyzed by agarose gel electrophoresis. Plasmid containing insert ofthe expected size was identified and designated pFP525.

Assembly of subsequence M1S was accomplished in the same manner,starting with plasmids pFP533 (digested with PstI-and DraIII) and pFP524(digested with PstI and SfiI). Plasmid containing the MIS subsequencewas identified and designated pFP531.

For assembly of the DNA monomer (M2LM1S), plasmid pFP525 (M2L) wasdigested with endonucleases PstI and DraIII, and plasmid pFP531 (M1S)was digested with endonucleases PstI and SfiI. Digested plasmid DNA wasfractionated by electrophoresis in a 1.2% low melting agarose gel.Ethidium bromide-stained bands containing the M2L and M1S sequences,respectively, identified by their relative sizes, were excised, theexcised bands combined, and the DNA recovered from melted agarose by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.).The eluted combined DNA fragments were incubated under ligationconditions and an aliquot was used to transform E. coli W3110.Ampicillin resistant transformants were selected. Plasmid DNA wasisolated from several transformants, digested with endonucleases BamHIand BglII, and analyzed by agarose gel electrophoresis. Plasmidcontaining insert of the expected size was identified and designatedpFP534. The DNA inserts in plasmids pFP523, pFP521, pFP533, pFP524,pFP525, pFP531, and pFP534 were verified by direct DNA sequencing aspreviously described.

Polymerization of the Gene:

The synthetic gene was extended by sequential doubling, starting withthe monomer sequence in pFP534. For doubling any insert sequence, analiquot of plasmid DNA was digested with endonucleases PstI and DraIII,and a separate aliquot of the same plasmid was digested withendonucleases PstI and SfiI. Digests were fractionated byelectrophoresis on low melting agarose, and ethidium bromide stainedfragments containing insert sequences were identified by their relativesizes. In some cases, the two fragments were not adequately separated,so it was necessary to cut the non-insert-containing fragment with athird enzyme, usually MluI.

Each of the two insert sequence-containing fragments has one endgenerated by endonuclease PstI. Annealing of these compatible singlestranded ends and ligation results in reconstitution of the gene thatconfers ampicillin resistance, part of which is carried on eachfragment. The other end of each fragment displays a single strandedsequence generated by either DraIII or SfiI. These sequences are, bydesign, complementary, and annealing and ligation results in ahead-to-tail coupling of two insert sequences, with concomitant loss ofboth sites at the junction. The principle of this method of insertsequence doubling was described by Kempe et al. (Gene 39, 239-245(1985)).

The two insert-containing fragments, purified by electrophoresis andrecovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, LaJolla, Calif.), were combined and incubated under ligation conditions.An aliquot was used to transform E. coli W3110. Ampicillin resistanttransformants were selected. Plasmid DNA was isolated from severaltransformants, digested with endonucleases BamHI and BglII, and analyzedby agarose gel electrophoresis. Plasmid containing insert of theexpected size was identified.

By this procedure a series of plasmids was constructed containing 2, 4,8, and 16 tandem repeats of the DNA monomer sequence M2LM1S, encodingthe series of DP-1A analogs. In addition, analogous methods were used toconstruct genes encoding the series of DP-1B analogs. For this purpose,subsequences SL (from pFP524 and pFP521) and M1M2 (from pFP533 andpFP523) were first constructed, then combined to form the monomerSLM1M2, which was polymerized as described. It should be apparent thatsimilar methods can be used to assemble any combination of subsequencescarried in the vector pFP510, or any other appropriate vector, providedthat the subsequences are bounded by cleavage sites for restrictionendonucleases that generate compatible ends (complementary singlestranded ends or blunt ends). In addition to various monomer sequences,polymers of any number of repeats of the monomer sequence can beassembled in the same way, starting with plasmids containing inserts ofdifferent sizes.

Example 2 SYNTHETIC GENE DP-1B.16

A second set of genes encoding DP-1B, designated DP-1B.16 (SEQ IDNO.:82), were designed to reduce the number of codons which are rarelyused in highly expressed E. coli genes, but at the same time encodingproteins of the same repeating sequence. The sequence of the DP-1B.16peptide monomer is shown in FIG. 10A and in SEQ ID NO.:82.

Oligonucleotide Synthesis and Cloning:

Synthetic genes encoding DP-1B.16 (SEQ ID NO.:82) were assembled fromfour double stranded synthetic oligonucleotides whose sequences (SEQ IDNOs.:64, 65, 66; SEQ ID NOs.:67, 68, 69; SEQ ID NOs.:70, 71, 72; and SEQID NOs.:73, 74, 75) are shown in FIGS 11A-11D. The oligonucleotides wereprovided by the manufacturer (Midland Certified Reagents, Midland, Tex.)in single stranded form with 5′-OH groups not phosphorylated. Forannealing to the double stranded form, complementary single strandedoligonucleotides (667 pmoles each) were mixed in 0.2 mL buffercontaining 0.01 M Tris-HCl, 0.01 M MgCl2, 0.05 M NaCl, 0.001 Mdithiothreitol, pH 7.9. The mixture was heated in boiling water for 1minute, then allowed to cool slowly to 23° C. over approximately 3 h.

The four double stranded oligonucleotides were separately cloned byinserting them into a plasmid vector pFP206 (FIG. 12). This vector wasderived from the plasmid pA126i as illustrated in FIG. 12. Briefly, DNAof plasmid pA126i was digested with endonucleases BamHI and EcoRI, andthe two fragments were separated by electrophoresis in a 1.2% agarose(low melting, BioRad). The larger of the two fragments was excised fromthe ethidium bromide-stained gel and recovered by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). Toapproximately 0.1 pmole of the eluted DNA fragment was added 10 pmolesof the double stranded, phosphorylated oligonucleotide SF31/32 (FIG.12). The mixture was incubated under ligation conditions with T4polynucleotide ligase for 8.5 h at 4° C. Ligated DNA was used totransform E. coli HB101, which had been made competent by calciumtreatment. Plasmid DNA isolated from ampicillin resistant transformantswas characterized by digestion separately with endonucleases HindIII,EcoRI, BglII, and BamHI, and a transformant containing the desiredplasmid was identified and designated pFP206.

DNA of plasmid pFP206 was digested with endonucleases BamHI and BglIIand purified by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284,La Jolla, Calif.). To approximately 0.1 pmole of the eluted plasmid DNAwas added 10 pmoles of one of the double stranded oligonucleotides 1(SEQ ID NOs.:64, 65, 66) 2 (SEQ ID NOs.:67, 68, 69), 3 (SEQ ID NOs.:70,71, 72), or 4 (SEQ ID NOs.:73, 74, 75). The four plasmid-oligonucleotidemixtures were incubated under ligation conditions for 15 h at 4° C.,then ligation was terminated by incubation for 3 min at 70° C. LigatedDNA was then digested with endonuclease HindIII to linearize anyremaining parental pFP206. Aliquots of ligated DNA were used totransform E. coli HB101 and ampicillin resistant transformants wereselected. Clones containing oligonucleotides 1, 2, 3, or 4 wereidentified by screening plasmid DNA isolated from individualtransformants with endonucleases BamHI and PstI. In plasmids withinserts in the desired orientation, the shorter of two BamHI-PstIfragments of pFP206 is lengthened by the length of the clonedoligonucleotide. Plasmid DNA from putative clones was furthercharacterized by digestion with endonucleases BamHI and BglII andanalysis by electrophoresis in 3% NuSieve agarose (FMC), 1% Agarose(Sigma Chemical Co.) to verify that the plasmid had acquired only asingle copy of the oligonucleotide in the correct orientation. Correctclones were identified and their plasmids were designated pFP636(oligonucleotide 1), pFP620 (oligonucleotide 2), pFP641 (oligonucleotide3), and pFP631 (oligonucleotide 4). Sequences of all four clonedoligonucleotides were verified by DNA sequencing as described above.

Assembly of the Gene:

For assembly of subsequence 1,2, plasmid pFP636 (1) was digested withendonucleases PstI and BamHI, and plasmid pFP620 (2) was digested withendonucleases PstI and BglII. Digested plasmid DNA was fractionated byelectrophoresis in a 1.2% agarose (low melting, BioRad) gel. Ethidiumbromide-stained bands containing the oligonucleotide sequences,identified by their relative sizes, were excised, the excised bandscombined, and the DNA recovered from melted agarose by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). The elutedcombined DNA fragments were incubated under ligation conditions and analiquot was used to transform E. coli HB101. Ampicillin resistanttransformants were selected. Plasmid DNA was isolated from severaltransformants, digested with endonucleases BamHI and BglII, and analyzedby agarose gel electrophoresis. Plasmid containing insert of theexpected size was identified and designated pFP647.

Assembly of subsequence 3,4 was accomplished in the same manner,starting with plasmids pFP641 (digested with PstI and BamHI) and pFP631(digested with PstI and BglII). Plasmid containing the 3,4 subsequencewas identified and designated pFP649.

For assembly of the DNA monomer (1,2,3,4), plasmid pFP647 (1,2) wasdigested with endonucleases PstI and BamHI, and plasmid pFP640 (3,4) wasdigested with endonucleases PstI and BglII. Digested plasmid DNA wasfractionated by electrophoresis in a 1.2% low melting agarose gel.Ethidium bromide-stained bands containing the 1,2 and 3,4 sequences,respectively, identified by their relative sizes, were excised, theexcised bands combined, and the DNA recovered from melted agarose by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.).The eluted combined DNA fragments were incubated under ligationconditions and an aliquot was used to transform E. coli HB101.Ampicillin resistant transformants were selected. Plasmid DNA wasisolated from several transformants, digested with endonucleases BamHIand BglII, and analyzed by agarose gel electrophoresis. Plasmidcontaining insert of the expected size was identified and designatedpFP652. The DNA insert in plasmid pFP652 was verified by direct DNAsequencing as described above.

Polymerization of the Gene:

The synthetic gene was extended by sequential doubling, starting withthe monomer sequence in pFP652. For doubling any insert sequence, analiquot of plasmid DNA was digested with endonucleases PstI and BamHI,and a separate aliquot of the same plasmid was digested withendonucleases PstI and BglII. Digests were fractionated byelectrophoresis on low melting agarose, and ethidium bromide stainedfragments containing insert sequences were identified by their relativesizes. The two insert-containing fragments, purified by electrophoresisand recovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284,La Jolla, Calif.), were combined and incubated under ligationconditions. At the third doubling, the two fragments in the BamHI digestwere not adequately separated, so the eluted band contained bothfragments. In this case a two-fold excess of the BglII-PstI fragment wasused in the ligation. An aliquot of the ligated DNA was used totransform E. coli HB101. Ampicillin resistant transformants wereselected. Plasmid DNA was isolated from several transformants, digestedwith endonucleases BamHI and BglII, and analyzed by agarose gelelectrophoresis. Plasmid containing insert of the expected size wasidentified.

By this procedure a series of plasmids was constructed containing 2, 4,8, and 16 tandem repeats of the DNA monomer sequence 1 (SEQ ID NOs.:64,65, 66), 2 (SEQ ID NOs.:67, 68, 69), 3 (SEQ ID NOs.:70, 71, 72), 4 (SEQID NOs.:73, 74, 75), encoding the series of DP-1B.16 analogs. Theseplasmids were designated pFP656 (2 repeats), pFP661 (4 repeats), pFP662(8 repeats), and pFP665 (16 repeats), respectively.

Example 3 SYNTHETIC GENE DP-2A

Oligonucleotide Synthesis and Cloning:

Synthetic genes encoding DP-2A SEQ ID NO:61 were assembled from sixdouble stranded synthetic oligonucleotides whose sequences are shown inFIGS. 7A-7F. The oligonucleotides were provided by the manufacturer(Midland Certified Reagents, Midland, Tex.) in double stranded form with5′-OH groups not phosphorylated. The six double strandedoligonucleotides were separately cloned by inserting them into theplasmid vector pFP206.

DNA of plasmid pFP206 was digested with endonucleases BamHI and BglIIand purified by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284,La Jolla, Calif.). To approximately 0.1 pmole of the eluted plasmid DNAwas added 10 pmoles of one of the double stranded oligonucleotides A(SEQ ID NOs.:41, 42, 43), B (SEQ ID NOs.:44, 45, 46), C (SEQ ID NOs.:47,48, 49), D (SEQ ID NOs.:50, 51, 52), E (SEQ ID NOs.:53, 54, 55), or F(SEQ ID NOs.:56, 57, 58). The six plasmid-oligonucleotide mixtures wereincubated under ligation conditions for 15 h at 4° C., then ligation wasterminated by incubation for 3 min at 70° C. Ligated DNA was thendigested with endonuclease HindIII to linearize any remaining parentalpFP206. Aliquots of ligated DNA were used to transform E. coli HB101 andampicillin resistant transformants were selected. Clones containingoligonucleotides A, B, C, D, E, or F were identified by screeningplasmid DNA isolated from individual transformants with endonucleasesBamHI and PstI. In plasmids with inserts in the desired orientation, theshorter of two BamHI-PstI fragments of pFP206 is lengthened by thelength of the cloned oligonucleotide. Plasmid DNA from putative cloneswas further characterized by digestion with endonucleases BamHI andBglII and analysis by electrophoresis in 3% NUSIEVE agarose (FMC), 1%Agarose (Sigma Chemical Co.) to verify that the plasmid had acquiredonly a single copy of the oligonucleotide in the correct orientation.Correct clones were identified and their plasmids were designated pFP193(oligonucleotide A), pFP194 (oligonucleotide B), pFP195 (oligonucleotideC), pFP196 (oligonucleotide D), pFP197 (oligonucleotide E), and pFP198(oligonucleotide F).

Assembly of the Gene:

For assembly of subsequence AB, plasmid pFP193 (A) was digested withendonucleases PstI and PvuII, and plasmid pFP194 (B) was digested withendonucleases PstI and SmaI. Digested plasmid DNA was fractionated byelectrophoresis in a 1.2% agarose (low melting, BioRad) gel. Ethidiumbromide-stained bands containing the oligonucleotide sequences,identified by their relative sizes, were excised, the excised bandscombined, and the DNA recovered from melted agarose by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). The elutedcombined DNA fragments were incubated under ligation conditions and analiquot was used to transform E. coli HB101. Ampicillin resistanttransformants were selected. Plasmid DNA was isolated from severaltransformants, digested with endonucleases BamHI and BglII, and analyzedby agarose gel electrophoresis. Plasmid containing insert of theexpected size was identified and designated pFP300 (AB).

Assembly of subsequence CD was accomplished in the same manner, startingwith plasmids pFPl95 (digested with PstI and SnaBI) and pFP196 (digestedwith PstI and SmaI). Plasmid containing the CD subsequence wasidentified and designated pFP578. Assembly of subsequence EF wasaccomplished in the same manner, starting with plasmids pFP197 (digestedwith PstI and SnaBI) and pFP198 (digested with PstI and SmaI). Plasmidcontaining the EF subsequence was identified and designated pFP583. TheDNA inserts in plasmids pFP300, pFP578, and pFP583 were verified bydirect DNA sequencing as described above.

Assembly of subsequence CDEF was accomplished similarly, starting withplasmids pFP578 (digested with PstI and PvuII) and pFP583 (digested withPstI and SmaI). Plasmid containing the CDEF subsequence was identifiedand designated pFP588.

For assembly of the DNA monomer (ABCDEF), plasmid pFP300 (AB) wasdigested with endonucleases PstI and PvuII, and plasmid pFP588 (CDEF)was digested with endonucleases PstI and SmaI. Digested plasmid DNA wasfractionated by electrophoresis in a 1.2% low melting agarose gel.Ethidium bromide-stained bands containing the AB and CDEF sequences,respectively, identified by their relative sizes, were excised, theexcised bands combined, and the DNA recovered from melted agarose by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.).The eluted combined DNA fragments were incubated under ligationconditions and an aliquot was used to transform E. coli HB101.Ampicillin resistant transformants were selected. Plasmid DNA wasisolated from several transformants, digested with endonucleases BamHIand BglII, and analyzed by agarose gel electrophoresis. Plasmidcontaining insert of the expected size was identified and designatedpFP303. The DNA insert in plasmid pFP303 was verified by direct DNAsequencing.

Polymerization of the Gene:

The synthetic gene was extended by sequential doubling, starting withthe monomer sequence in pFP303. For doubling any insert sequence, analiquot of plasmid DNA was digested with endonucleases PstI and PvuII,and a separate aliquot of the same plasmid was digested withendonucleases PstI and SmaI. Digests were fractionated byelectrophoresis on low melting agarose, and ethidium bromide stainedfragments containing insert sequences were identified by their relativesizes. The two insert-containing fragments, purified by electrophoresisand recovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284,La Jolla, Calif.), were combined and incubated under ligationconditions. An aliquot of the ligated DNA was used to transform E. coliHB101. Ampicillin resistant transformants were selected. Plasmid DNA wasisolated from several transformants, digested with endonucleases BamHIand BglII, and analyzed by agarose gel electrophoresis. Plasmidcontaining insert of the expected size was identified.

By this procedure a series of plasmids was constructed containing 2, 4,8, and 16 tandem repeats of the DNA monomer sequence ABCDEF, encodingthe series of DP-2A analogs. These plasmids were designated pFP304 (2repeats), pFP596 (4 repeats), pFP597 (8 repeats), and pFP598 (16repeats), respectively.

Example 4 EXPRESSION OF DP-1 AND DP-2 ANALOG GENES IN E. COLI

Immunoassay

For detection of DP-1 analog amino acid sequences, polyclonal antiserawere raised in rabbits by immunization with a synthetic peptide matchingthe most highly conserved segment of the consensus repeat sequence ofthe natural protein. The peptide (sequence CGAGQGGYGGLGSQGAGRG-NH₂) (SEQID NO:8) was synthesized by standard solid phase methods (MultiplePeptide Systems, San Diego, Calif.) and coupled through its terminal Cysthiol to Keyhole Lympet Hemocyanin viamaleimidobenzoyl-N-hydroxysuccinimide ester. Similarly, for detection ofDP-2 analog amino acid sequences, antisera were raised against a peptideof sequence CGPGQQGPGGYGPGQQGPS-NH₂ (SEQ ID NO:9), which reflects theconsensus repeat sequence of the natural protein DP-2.

For the growth of cultures to assess production levels, 20 mL L broth(per liter: 10 g Bacto-Tryptone (Difco), 5 g Bacto-Yeast Extract(Difco), 5 g NaCl, pH adjusted to 7.0 with NaOH) containing 0.1 mg/mLampicillin in a 125 mL baffled Erlenmeyer flask was inoculated at anabsorption (A600 nm) of approximately 0.05 with cells eluted from anL-agar plate containing 0.1 mg/mL ampicillin, which had been grownovernight at 37° C. The culture was shaken at 37° C. until the A₆₀₀ nmreached approximately 1.0, at which time IPTG was added to a finalconcentration of 1 mM. Samples (0.5 mL) were taken immediately beforeIPTG addition and after an additional 3 h at 37° C. Cells wereimmediately recovered by centrifugation in a microfuge, supernatant wasremoved, and the cell pellet was frozen in dry ice and stored at −70° C.

For analysis by polyacrylamide gel electrophoresis, cell pellets werethawed, suspended in 0.2 mL sample preparation buffer (0.0625 MTris-HCl, pH 6.8, 2% w/v Na-dodecyl sulfate, 0.0025% w/v bromphenolblue, 10% v/v glycerol, 2.5% v/v 2-mercaptoethanol), and incubated in aboiling water bath for 5 min. Aliquots (15 μl) were applied to a 4-12%gradient polyacrylamide gel (Novex) and subjected to electrophoresisuntil the dye front was less than 1-cm from the bottom of the gel. Thegel was stained with Coomassie Brilliant Blue. A second gel (6%acrylamide) was run with similar samples, then protein bands weretransferred electrophoretically to a sheet of nitrocellulose, using anapparatus manufactured by Idea Scientific, Inc. The buffer for transfercontained (per liter) 3.03-g Trishydroxymethyl aminomethane, 14.4-gglycine, 0.1% w/v SDS, 25% v/v methanol.

The nitrocellulose blot was stained immuno-chemically as follows.Protein binding sites on the sheet were blocked by incubation with“Blotto” (3% nonfat dry milk, 0.05% TWEEN 20, in Tris-saline (0.1 MTris-HC1, pH 8.0, 0.9% w/v NaCl)) for 30 min at room temperature on arocking platform. The blot was then incubated for 1 h with anti DP-1serum or anti DP-2 serum, diluted 1:1000 in “Blotto”, washed withTris-saline, and incubated for 1 h with horseradishperoxidase-conjugated goat anti-rabbit IgG serum (Kierkegaard and PerryLaboratories, Gaithersburg, Md.), diluted 1:1000 in “Blotto”. Afteragain washing with Tris-saline, the blot was exposed to a solution of 18mg 4-chloro-1-naphthol in 6 mL methanol, to which had been added 24 mLTris-saline and 30 μl 30% H₂O₂.

For quantitation of DP-1 antigen production, cell extracts were preparedby either of two procedures.

Procedure 1: The cell pellet from 0.5 mL culture was resuspended in0.084 mL 50 mM EDTA, pH 8.0, to which was then added 10 μl 10 mg/mL eggwhite lysozyme in the same buffer, 1 μl 2 mg/mL bovine pancreaticribonuclease, and 5 μl 0.1 M phenyl methane sulfonyl fluoride inethanol. After 15 min at 37° C., 1 μl 1 mg/mL DNase I was added, alongwith 3 μl 1 M MgCl₂, 1 M MgSO₄, and incubation was continued for 10 minat 37° C. The resulting lysate was clarified by centrifugation for 5 minin a microfuge, and the supernatant was diluted to 0.5 mL withTris-saline.

Procedure 2: The cell pellet was resuspended in 0.5 mL of buffer 8.OGcontaining 6 M guanidine-HCl, 0.1 M NaH2PO₄, 0.01 M Tris-HCl, 5 mM2-mercaptoethanol, pH adjusted to 8.0 with NaOH. After thorough mixingand incubation for 1 h at 23° C., cell debris was removed bycentrifugation for 15 minutes in a microfuge.

Aliquots (1 μl) of serial dilutions in Tris saline (Procedure 1) orbuffer 8.0G (Procedure 2) were spotted onto nitrocellulose, along withvarious concentrations of a standard solution of purified DP-1 8-mer (8repeats of 101 amino acid residues). The nitrocellulose sheet was thentreated as described above for the Western blot. The concentration ofDP-1 antigen in each sample was estimated by matching the colorintensity of one of the standard spots.

Production Strains:

Vectors:

To construct bacterial strains for production of DP-1, cloned syntheticDP-1-coding DNA sequences were inserted into plasmid vector, pFP202(FIG. 6) or pFP204, which were derived from plasmid pFP200, which was,in turn, derived from the plasmids pET11a and pET9a of Studier et al.,Methods in Enzymology, 185, 60 (1990). Plasmids pET9a and pET11a andhost strains BL21, BL21(DE3), HMS174, and HMS174(DE3) were obtained fromNovagen, Madison, Wis.

To construct the plasmid pFP200, DNA of plasmids pET9a and pET11a weredigested with endonucleases EcoRI and AlwNI and the digests fractionatedseparately by electrophoresis in low-melting agarose. The appropriateethidium bromide-stained bands (from pET9a, the band carrying the genethat confers resistance to kanamycin, and from pET11a, the band carryingthe T7 promoter) were identified by size, excised and recovered frommelted gel slices by the GENECLEAN® procedure (Bio101, Inc., P.O. Box2284, La Jolla, Calif.). Equivalent amounts of the purified DNA bandswere combined and incubated under ligation conditions. An aliquot of theligated DNA was used to transform E. coli BL21 and transformants wereselected for resistance to kanamycin (50 μg/mL). Plasmid DNA fromindividual transformants was analyzed following digestion withendonuclease ClaI, and a correct one was identified and designatedpFP200.

Next DNA sequences encoding six consecutive histidine residues wereinserted into pFP200. Such sequences were carried on a synthetic doublestranded oligonucleotide (SF25/26) with the following sequence:

      G  S  H  H  H  H  H  H  S  R (SEQ ID NO:10)5′HO-GATCCCATCACCATCACCATCACTCTA (SEQ ID NO:11)         GGTAGTGGTAGTGGTAGTGAGATCTAG-OH 5′ (SEQ ID NO:12)

The amino acid sequence encoded by this oligonucleotide when it isinserted in the correct orientation into the BamHI site of pFP200 isshown in one-letter code above the DNA sequence. DNA of pFP200 wasdigested with endonuclease BamHI and recovered by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). An aliquot ofthis digested DNA (approximately 0.02 pmoles) was mixed witholigonucleotide SF25/26 (10 pmoles), the 5′ termini of which had notbeen phosphorylated. After incubation under ligation conditions for 5 hat 4° C. and 20 min at 23° C., an aliquot was used to transform E. coliBL21. Transformants were selected for kanamycin resistance and plasmidDNA of individual transformants was analyzed following digestion withendonucleases EcoRI and BamHI. A correct plasmid was identified by thepresence in the digest of a DNA band indicative of restoration of theBamHI site at the promoter-proximal end of the oligonucleotide sequence,resulting from insertion in the desired orientation. This plasmid wasdesignated pFP202. Correct insertion of the oligonucleotide was verifiedby direct DNA sequencing as described above.

The plasmid vector pFP204 was constructed in an analogous manner, byinserting into pFP200 a synthetic double stranded oligonucleotide(SF29/30) with the following sequence:

(SEQ ID NO:13)      G  S  H  H  H  H  H  H (SEQ ID NO: 14)5′HO-GATCCCATCACCATCACCATCACTAAA (SEQ ID NO:15)         GGTAGTGGTAGTGGTAGTGATTTCTAG-OH 5′

This oligonucleotide places a termination codon immediately followingthe six tandem His residues.

DP-1A.9 Strains:

Next sequences encoding DP-1A were inserted into pFP202 at the BamHIsite located between the T7 promoter and sequences encoding the His6oligomer. DNA of plasmids pFP534 (encoding 101 aa DP-1A), pFP538(encoding 2 repeats of 101 aa DP-1A), and pFP541 (8 repeats of 101 aaDP-1A) were digested with endonucleases BamHI and BglII, and pFP546 (16repeats of 101 aa DP-1) was digested with BamHI, BglII, and EcoRI. Thedigests were fractionated by electrophoresis in low-melting agarose, andthe ethidium bromide-stained band carrying the DP-1-encoding sequenceswas identified by size and excised. The excised gel bands were melted,and to each was added an aliquot of pFP202 DNA that had been digestedwith endonuclease BamHI. DNA was recovered by the GENECLEAN® procedure(Bio101, Inc., P.O. Box 2284, La Jolla, Calif.) and incubated underligation conditions for 2 h at 4° C., followed by 20 min at 23° C. Analiquot of ligated DNA was used to transform E. coli BL21(DE3), andtransformants were selected for resistance to kanamycin.

Individual transformants were patched onto a sheet of cellulose acetateon the surface of LB agar containing kanamycin. After overnight growth,the cellulose acetate was transferred to a second plate on which a sheetof nitrocellulose had been placed on the surface of LB agar containing1mM IPTG. After incubation for 3 h at 37° C., the nitrocellulose sheetwas removed from under the cellulose acetate, blocked with “Blotto”, anddeveloped by immunochemical staining with anti-DP-1 serum as describedbelow. Positive transformants, identified by blue color in this colonyimmunoassay, were picked from a replica master plate that had beeninoculated at the same time as the immunoassay plate, with the sametransformant colonies. The correct structure of plasmid DNA frompositive transformants was verified following digestion withendonucleases BamHI and BglII. Transformants in which the DP-1-encodinginsert was inserted backwards (as identified by the formation ofappropriately sized bands in the digest) gave a positive reaction oncolony immunoassay, but the color yield was markedly less intense thanthose in the correct orientation. Transformants containing plasmids withcorrectly oriented inserts were identified and designated FP3211 (1repeat of 101 aa), FP3217 (2 repeats), FP3203 (8 repeats) and FP3206 (16repeats).

DP-1 protein produced by strains FP3217, FP3203, and FP3206 was assayedby Western blot analysis as described below. All were shown to producefull-length protein of the expected size, detected by anti-DP-1 serum.In addition, a regular array of anti-DP-1-staining protein bands wasobserved, mainly at higher gel mobilities.

DP-1B.9 Strains:

E. coli strains for the production of DP-1B.9 were constructed in asimilar fashion by transferring DNA fragments encoding DP-1B.9 (SEQ IDNO.:81) (derived by digestion with BamHI and BglII of plasmids pFP156and pFP158, containing 8 and 16 repeats of the 303 bp DNA monomer,respectively) into plasmid pFP202. The resulting production strains weredesignated FP2121 (8repeats) and FP2123 (16 repeats). Both strains wereshown by Western Blot analysis to produce full-length protein of theexpected size.

DP-1B.16 Strains:

E. coli strains for the production of DP-1B.16 (SEQ ID NO.:82) wereconstructed in a similar fashion by transferring DNA fragments encodingDP-1B.16 (derived by digestion with BamHI and BglII of plasmids pFP662and pFP665 containing 8 and 16 repeats of the 303 bp DNA monomer,respectively) into plasmid pFP204. The resulting production strains weredesignated FP3350 (8 repeats) and FP3356 (16 repeats). Both strains wereshown by Western Blot analysis to produce full-length protein of theexpected size. Host cell FP3350 has been deposited with the ATCC underthe terms of the Budapest Treaty and is identified by the ATCC numberATCC 69328 (deposited Jun. 15, 1993).

DP-2A Strains:

E. coli strains for the production of DP-2A SEQ ID NO:61 wereconstructed in a similar fashion by transferring DNA fragments encodingDP-2A (derived by digestion with BamHI and BglII of plasmids pFP597 andpFP598, containing 8 and 16 repeats of the 357 bp DNA monomer,respectively) into plasmid pFP204. The resulting production strains weredesignated FP3276 (8 repeats) and FP3284 (16 repeats). Both strains wereshown by Western Blot analysis to produce full-length protein of theexpected size.

Example 5 LARGE SCALE PRODUCTION, PURIFICATION AND QUANTITATION OFRECOMBINANT SILK VARIANT PROTEINS

Purification of DP-1A.9 (SEO ID NO.:80):

Strain FP3203 was grown at 36° C. in a Fermgen fermenter (New BrunswickScientific, New Brunswick, N.J.) in 10 l of a medium containing:

(NH₄)₂SO₄ 3.0 g MgSO₄ 4.5 g Na citrate · 2H₂O 0.47 g FeSO₄ · 7H₂O 0.25 gCaCl₂ · 2H₂O 0.26 g Thiamine-HCl 0.6 g Casamino acids 200 g Biotin 0.05g K₂HPO₄ 19.5 g NaH₂PO₄ 9.0 g Glycerol 100 g L-Alanine 10.0 g Glycine10.0 g Glucose 200 g PPG 5 mL ZnSO₄ · 7H₂O 0.08 g CuSO₄ · 5H₂O 0.03 gMnSO₄ · H₂O 0.025 g H₃BO₃ 0.0015 g (NH₄)_(n)MO_(x) 0.001 g COCl₂ · 6H₂O0.0006 g

The fermenter was inoculated with 500 mL overnight culture of FP3203 inthe same medium. The pH was maintained at 6.8 by addition of 5 N NaOH or20% H₃PO₄. Dissolved O₂ was maintained at approximately 50%. When theabsorption at 600 nm had reached 10-15, production of DP-1 was inducedby adding 5-g IPTG. After 3 h, cells were harvested by centrifugationand frozen. The yield was 314 g cell paste. Thawed cells (100 g paste)were suspended in 1000 mL buffer 8.OG containing 6 M guanidine-HCl, 0.1M NaH₂PO₄, 0.01 M Tris-HCl, 5 mM 2-mercaptoethanol, pH adjusted to 8.0with NaOH. After stirring for 1 h at 23° C., the lysate was clarified bycentrifugation at 10,000×g for 30 min, and the supernatant was filteredthrough Whatman No. 3 paper. To the filtrate was added 200 mL packedvolume of Ni-nitrilotriacetic acid (NTA)-agarose (Qiagen, Inc.), whichhad been equilibrated with buffer 8.0 G, recovered by filtration, anddrained. The lysate-resin slurry was stirred at 23° C. for 24 h, thenthe resin was recovered by filtration on Whatman No. 3 paper. Thedrained resin. was suspended in 500 mL buffer 8.0 G and packed into achromatography column (5 cm diameter). The column was washed with 500 mLbuffer 8.0 G, then with successive 320 mL volumes of buffers of the samecomposition as buffer 8.0 G, but with the pH adjusted with NaOH to thefollowing values: pH 6.3, 6.1, 5.9, 5.7, and 5.5. Effluent fractions of40 mL were collected. DP-1 protein was located by immunoassay, asdescribed above. Positive fractions were pooled and the pH was adjustedto 8.0 with NaOH. Immunoassay and Western blot analysis revealed thatapproximately 50% of the material containing DP-1 sequences was adsorbedto the resin and recovered in the pooled fractions. The remainingmaterial apparently lacks the C-terminal oligo-histidine affinity tail,presumably as a result of premature termination of protein synthesis.

The concentration of 2-mercaptoethanol was adjusted to 17 mM, and thepooled material was stirred for 5 h at 23° C. This material wasreapplied to the same Ni-NTA-agarose column, which had beenre-equilibrated with buffer 8.0 G. The column was then washed with 200mL buffer 8.0 G and 400 mL of buffer with a similar composition, butwith a pH of 6.5, followed by 400 mL of a buffer composed of 0.1 Macetic acid adjusted to pH 6.5 with triethylamine, plus 5 mM2-mercaptoethanol. DP-1 protein was eluted with 800 mL of a buffercomposed of 0.1 M acetic acid adjusted to pH 5.0 with triethylamine,while 40 mL eluant fractions were collected. DP-1 protein was located byimmunoassay. Positive fractions were pooled and the buffer was removedby lyophilization. Yield of lyophilized material was 100 mg,representing approximately 1% of the total protein present in the 100 gcell paste from which it was derived.

Amino acid analysis of the purified DP-1 is shown in Table I and isconsistent with the predicted amino acid sequence, with impurities (asproteins of amino acid composition reflecting the overall composition ofE. coli (Schaechter, M. et al., in Escherichia coli and Salmonellatyphimurium, Neidhardt, F. C. (ed) Washington D.C., American Associationfor Microbiology, p.5, (1987)) less than 7%.

TABLE I Amino Acid Analysis DP1-A, 8-mer, Recovered from FP3203 AminoResidues per Molecule n Moles Acid Theoretical Experimental Experimental(Raw) Gly 383 367 10.91 Ala 235 [235] 6.98 Glx 92 98 2.91 Leu 40 40 1.32Ser 37 37 1.09 Tyr 24 25 0.75 Arg 18 22 0.66 Met 3 3 0.09 His 6 8.7 0.26Asx 0 6 0.18 Thr 1 4 0.13 Val 0 4 0.13 Ile 0 3 0.10 Phe ° ° Lys 0 3 0.10Pro 0 0 0.00 Purity: 93%

Purification of DP-1B.16 (SEQ ID NO.:82):

Strain FP3350 was grown in 5 liters under conditions noted above. Thawedcell paste (154 g) was suspended in 1000 mL buffer 8.0 G and stirred for2 h at 23° C. The lysate was clarified by centrifugation for 30 min at10,000×g. To the supernatant was added 300 mL (packed volume) of Ni-NTAagarose equilibrated with buffer 8.0 gG. The mixture was stirred at 23°C. for 18 h, then the resin was recovered by centrifugation at 1,000×gfor 30 min. The resin was diluted to 800 mL with buffer 8.0 G, mixed,and allowed to settle. Supernatant was removed and the settlingprocedure was repeated. The settled resin was then diluted with an equalvolume of buffer 8.0 G and packed into a chromatography column (5 cmdiameter). The column was washed successively with (a) 1300 mL buffer8.0 G, (b) 500 mL buffer 8.0 G containing 8 mM imidazole, (c) 100 mLbuffer 8.0 G, and (d) 500 mL buffer 6.5 G (same composition as buffer8.0 G, but with the pH adjusted to 6.5 with NaOH). DP-1B.16 protein wasfinally eluted with buffer 5.5 G (same composition as buffer 8.0 G, butwith the pH adjusted to 5.5 with NaOH). Fractions containing DP-1B.16were identified by spot immunoassay, pooled, and concentratedapproximately 40-fold by ultrafiltration using Centriprep 30 centrifugalconcentrators (Amicon). Protein was precipitated by the addition of 5volumes of methanol, incubating 16 h at 4° C., recovered bycentrifugation, washed twice with methanol and vacuum dried.

The yield of dried material was 287 mg, representing approximately 2% ofthe total protein present in the 154 g cell paste from which it wasderived. Amino acid analysis is shown in Table II and is consistent withthe predicted amino acid sequence, with impurities (as proteins of aminoacid composition reflecting the overall composition of E. coli)representing approximately 21% of the total protein in the sample.

TABLE II Amino Acid Analysis DP-1B16 8-mer Recovered from FP3350 AminoResidues per Molecule n Moles Acid Theoretical Experimental Experimental(Raw) Gly 383 338 26.27 Ala 235 [235] 18.25 Glx 92 105 8.13 Leu 40 544.22 Ser 37 32 2.44 Tyr 24 25 1.95 Arg 18 30 2.32 Met 3 4.2 0.32 His 624.2 1.88 Asx 0 19.2 1.49 Thr 1 9.4 0.73 Val 0 13.5 1.05 Ile 0 10.7 0.83Phe 0 7.3 0.57 Lys 0 10.1 0.78 Pro 0 8.6 0.67 Purity: 79%

Purification of DP-2A (SEQ ID NO.:83):

Strain FP3276 was grown in 5 liters under conditions noted above, exceptthat the growth medium was supplements at inoculation with 0.375 g/lL-proline, and at induction with 0.1 g/l glycine and L-alanine and0.0375 g/l L-proline. Thawed cell paste from two such fermentations (150g and 140 g, respectively) was suspended in 1000 mL each buffer 8.0 Gand stirred for 1 h at 23° C. The lysate was clarified by centrifugationfor 30 min at 10,000×g. The supernatants were combined and mixed with300 mL (packed volume) of Ni-NTA agarose equilibrated with buffer 8.0 G.The mixture was stirred at 23° C. for 18 h, then the resin was recoveredby centrifugation at 1,000×g for 30 min. The resin was diluted to 800 mLwith buffer 8.0 G, mixed, and allowed to settle. Supernatant was removedand the settling procedure was repeated twice. The settled resin wasthen diluted with an equal volume of buffer 8.0 G and packed into achromatography column (5 cm diameter). The column was washedsuccessively with (a) 1350 mL buffer 8.0 G, (b) 400 mL buffer 8.0 Gcontaining 8 mM imidazole, (c) 100 mL buffer 8.0 G, and (d) 750 mLbuffer 6.5 G. DP-2A SEQ ID NO:61 protein was finally eluted with buffer5.5 G. Fractions containing DP-1B.16 were identified by spot immunoassayand pooled.

Of a total of 240 mL pooled fractions, 150 was removed and concentratedapproximately 40-fold by ultrafiltration using Centriprep 30 centrifugalconcentrators (Amicon). Protein was precipitated by the addition of 5volumes of methanol, incubating 16 h at 4° C., recovered bycentrifugation, washed twice with methanol and vacuum dried. The yieldof dried material was 390 mg.

The remaining 90 mL pooled column fractions was concentrated 8-foldusing Centriprep 30 concentrators, diluted to the original volume withwater and concentrated again. This procedure was repeated threeadditional times in order to remove guanidine to less than 5 mM. Thematerial was finally lyophilized. The weight of lyophilized material was160 mg. Thus the total yield of purified DP-2A SEQ ID NO:61 was 550 mg,representing approximately 2% of the total protein present in the 290 gcell paste from which it was derived.

Amino acid analysis of a sample of the lyophilized material is shown inTable III and is consistent with the predicted amino acid sequence, withimpurities (as proteins of amino acid composition reflecting the overallcomposition of E. coli) representing less than 4% of the total proteinin the sample.

TABLE III Amino Acid Analysis DP-2A, 8-mer Recovered from Strain FP3276Amino Residues per Molecule n Moles Acid Theoretical ExperimentalExperimental (Raw) Gly 373 351 16.98 Ala 185 [185] 8.95 Pro 169 158 7.64Glx 130 93 4.51 Ser 51 48 2.35 Tyr 56 57 2.76 Met 3 2.0 0.10 His 6 9.20.45 Leu 1 1.8 0.09 Asx 0 ND ND Thr 1 ND ND Val 0 5.5 0.27 Ile 0 0 0.00Phe 0 2.8 0.13 Lys 0 1.9 0.09 Arg 1 0 0.00 Purity: 96%

The present invention discloses the construction of several specificexpression systems useful for the production of spider silk variantproteins. In order to leave no doubt that one of skill in the art mightbe able to use the elements of the instant invention to produce themyriad of other spider silk variant proteins not specifically discussed,E. coli bacteria transformed with an expression vector (pFP204) devoidof synthetic spider silk variant DNA has been deposited with the ATCCunder the terms of the Budapest treaty and is identified by the ATCCnumber ATCC 69326. The expression pFP204 contained in the host cell E.coli HB101 comprises all the necessary restriction sites needed to clonesynthetic spider silk DNA of the instant invention and may be used toexpress any spider silk variant protein. In addition, the expressionhost strain E. coli BL21 (DE3) transformed with a plasmid pFP674carrying DP-1B.16 coding sequences (SEQ ID NO.:82), has been depositedwith the ATCC under the terms of the Budapest treaty and is identifiedby the ATCC number ATCC 69328. This strain can be used to produce DP-1Baccording to this invention, or cured of plasmid by methods well knownto those skilled in the art and transformed with other expressionvectors derived from pFP204.

Example 6 SYNTHESIS AND EXPRESSION OF DP-1 ANALOG IN BACILLUS SUBTILIS

For expression in Bacillus subtilis, a DP-1 analog-encoding gene fromplasmid pFP141 was placed in a plasmid vector capable of replication inB. subtilis. DP-1 coding sequences were operably linked to a promoterderived from the levansucrase (lvs) gene of Bacillus amyloliquefaciensin such a manner that the N-terminal amino acid sequence coded by thelevansucrase gene, which comprises a secretion signal sequence, wasfused to the DP-1 sequence at its N-terminus. Gene fusions of this typehave been shown, in some cases, to promote the production and secretioninto the extracellular medium of foreign proteins (Nagarajan et al. U.S.Pat. No. 4,801,537).

As illustrated in FIG. 15, to prepare the DP-1 analog gene for transferinto the appropriate vector for B. subtilis, the endonuclease BglII siteat the proximal end of the DP-1 coding sequence in plasmid pFP541 wasfirst converted to an EcoRV site by inserting a syntheticoligonucleotide. DNA of plasmid pFP541 was digested with endonucleaseBglII. Approximately 0.1 pmole of the linearized plasmid DNA was thenincubated under ligation conditions with 10 pmoles of a synthetic doublestranded oligonucleotide (SI9/10) with the following sequence:

5′HO-GATCAGATATCG (SEQ ID NO:16)          TCTATAGCCTAG-OH 5′ (SEQ IDNO:17)

Ampicillin resistant transformants of E. coli HMS174 were screened forplasmid DNA containing an EcoRV site provided by the syntheticoligonucleotide sequence. A plasmid containing an EcoRV site wasidentified and designated pFP169b (FIG. 15A). Next the DNA fragmentcarrying DP-1 coding sequences was isolated from pFP169b followingdigestion with endonucleases EcoRV and BamHI and separation of theresulting DNA fragments by agarose gel electrophoresis. A band of theappropriate size was excised from the ethidium bromide stained gel andDNA was recovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box2284, La Jolla, Calif.).

The plasmid vector pBE346 contains replication origins that conferautonomous replication in both E. coli and B. subtilis, as well asantibiotic resistance markers selectable in E. coli (ampicillin) and B.subtilis (kanamycin). In addition, the plasmid contains the lvs promoterand secretion signal operably linked to a staphylococcal protein A gene.The protein A gene is bounded by an EcoRV site at its proximal end,separating it from the lvs signal sequence, and a BamHI site at itsdistal end. The complete DNA sequence of pBE346 (FIG. 14A) is shown inSEQ ID NO.:79 and in FIGS. 14A-14F. In order to remove the protein Agene and allow for its replacement by the DP-1 gene, DNA of plasmidpBE346 was digested with endonucleases EcoRV and BamHI and theappropriate sized fragment was isolated following agarose gelelectrophoresis. DNA was recovered from the ethidium bromide stained gelband by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla,Calif.).

DNA fragment purified from pFP169b (above) was mixed with the DNAfragment purified from pBE346 and incubated under ligation conditions.Ligated DNA was used to transform E. coli HMS174, and ampicillinresistant transformants were screened by examining plasmid DNA for thepresence of appropriately sized fragments following digestion withendonucleases EcoRV and BamHI. A correct plasmid was identified anddesignated pFP191 (FIG. 15B).

DNA of plasmid pFP191 was used to transform competent cells of B.subtilis BE3010 (trp lys apr npr sacB). Transformants were selected forresistance to kanamycin. BE3010 was derived from B. subtilis BE1500,(trpC2, metB10, lys3, delta-aprE, delta-npr, sacB::ermC) which has beendescribed by Nagarajan et al., Gene, 114, 121, (1992) by transformingcompetent BE1500 cells with DNA from B. subtilis 1S53 (Bacillus GeneticStock Center, Ohio State University) and selecting for methionineprototrophs. Transformation of competent cells was carried outessentially as described by Nagarajan et al., U.S. Pat. No. 4,801,537.

Kanamycin resistant transformants of BE3010 were screened for theability to produce DP-1 by colony immunoassay. Colonies were grown on acellulose acetate disk placed on the surface of a plate containing TBABagar plus 5 micrograms per mL kanamycin. After colonies had developed at37° C., the cellulose acetate disk was transferred to a fresh platecontaining the same medium plus 0.8% sucrose, and placed over anitrocellulose disk which was placed on the surface of the agar. Afterincubation for 3 h at 37° C., the nitrocellulose disk was removed andstained with anti-DP-1 serum, peroxidase-conjugated goat anti-rabbitIgG, and 4-chloro-1-naphthol plus hydrogen peroxide as described above.Positively staining images of the colonies were observed, indicating theproduction and excretion of DP-1, compared to a negative control straincontaining a plasmid with no DP-1 coding sequences. The positive strainwas designated FP2193. FP2193 has been deposited with the ATCC under theterms of the Budapest Treaty and is identified by the ATCC number, ATCC69327.

The production and excretion of DP-1 by FP2193 was assayed in liquidculture. Strain FP2193 was grown in Medium B, containing, per liter, 33g Bacto-tryptone (Difco), 20 g yeast extract, 7.4 g NaCl, 12 mL 3N NaOH,0.8 g Na₂HPO₄, 0.4 g KH₂PO₄, 0.2% casamino acids (Difco), 0.5% glycerol,0.06 mM MnCl₂, 0.5 nM FeCl₃, pH 7.5. After growth for 3.5 h at 37° C.,production of DP-1 was induced by the addition of sucrose to 0.8%. After4 h additional incubation at 37° C., a sample of 0.5 mL was analyzed.Cells were removed by centrifugation. The upper 0.4 mL of supernatantwas removed and phenylmethane sulfonyl fluoride (PMSF) was added to 2mM. The residual supernatant was removed and discarded. The cell pelletwas suspended in 0.32 mL 50 mM EDTA, pH8.0, and lysed by the addition of0.08 mL 10 mg/mL egg white lysozyme in the same buffer, plus 2 mM PMSF.After incubation for 60 min at 37° C., 0.01 mL 2M MgCl2 and 0.001 mL 1mg/mL deoxyribonuclease I were added, and incubation continued for 5 minat 37° C. Aliquots (5 microliters) of each fraction, cell lysate andsupernatant, were analyzed by SDS gel electrophoresis andelectroblotting as described above. The blot was stained with anti-DP-1serum. Several positively staining bands were observed in thesupernatant fraction, and only a trace of positive band in the celllysate. The host strain BE3010 containing no DP-1 coding DNA sequencesproduced no positively staining bands. Thus B. subtilis strain FP2193was shown to produce DP-1 analog protein and to excrete it efficientlyinto the extracellular medium.

Example 7 DP-1B Production in Pichia pastoris

1. Synthetic Gene DP-1B.33

A set of genes encoding DP-LB, designated DP-1B.33, were designed toencode proteins of the same repeating sequence as DP-1B.9 and DP-1B.16,but to use predominantly codons favored in the highly expressed alcoholoxidase genes of Pichia pastoris.

a. Oligonucleotides

Synthetic genes encoding DP-1B.33 were assembled from four doublestranded synthetic oligonucleotides whose sequences are shown in FIG.16. The oligonucleotides were provided by the manufacturer (MidlandCertified Reagents, Midland, Tex.) in single-stranded form with 5′-OHgroups not phosphorylated. For annealing to the double-stranded form,complementary single stranded oligonucleotides (667 pmoles each) weremixed in 0.2 ml buffer containing 0.01 M Tris-HCl, 0.01 M MgCl₂, 0.05 MNaCl, 0.001 M dithiothreitol, pH 7.9. The mixture was heated in boilingwater for 1 min, then allowed to cool slowly to 23° C. overapproximately 3 h.

The four double-stranded oligonucleotides were separately cloned byinserting them into a plasmid vector pFP206. DNA of plasmid pFP206 wasdigested with endonucleases BamHI and BglII and purified by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). Toapproximately 0.1 pmole of the eluted plasmid DNA was added 10 pmoles ofone of the double-stranded oligonucleotides P1, P2, P3, or P4. The fourplasmid-oligonucleotide mixtures were incubated under ligationconditions for 20 h at 4° C., then ligation was terminated by incubationfor 2 min at 70° C. Ligated DNA was then digested with endonucleaseHindIII to linearize any remaining parental pFP206. Aliquots of ligatedDNA were used to transform E. coli HB101 and ampicillin resistanttransformants were selected. Clones containing oligonucleotides P1, P2,P3, or P4 were identified by screening plasmid DNA isolated fromindividual transformants with endonucleases BamHI and PstI. In plasmidswith inserts in the desired orientation, the shorter of two BamHI-PstIfragments of pFP206 is lengthened by the length of the clonedoligonucleotide. Plasmid DNA from putative clones was furthercharacterized by digestion with endonucleases BamHI and BglII andanalysis by electrophoresis in 3.8% MetaPhor agarose (FMC) to verifythat the plasmid had acquired a single copy of the oligonucleotide inthe correct orientation. Correct clones were identified and theirplasmids were designated pFP685 (oligonucleotide P1, SEQ ID NOs.:84, 85,and 86), pFP690 (oligonucleotide P2, SEQ ID NOs.:87, 88, and 89), pFP701(oligonucleotide P3, SEQ ID NOs.:90, 91, and 92), and pFP693(oligonucleotide P4, SEQ ID NOs.:93, 94, and 95). Sequences of all fourcloned oligonucleotides were verified by DNA sequencing.

b. Assembly of the Gene

For assembly of subsequence P1,P2, plasmid pFP685 (P1, SEQ ID NOs.:84,85, and 86) was digested with endonucleases PstI and BamHI, and plasmidpFP690 (P2, SEQ ID NOs.:87, 88, and 89) was digested with endonucleasesPstI and BglII. Digested plasmid DNA was fractionated by electrophoresisin a 1.2% agarose (low melting, BioRad, Hercules, Calif.) gel. Ethidiumbromide-stained bands containing the oligonucleotide sequences,identified by their relative sizes, were excised, the excised bandscombined, and the DNA recovered from melted agarose by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). The elutedcombined DNA fragments were incubated under ligation conditions and analiquot was used to transform E. coli HB101. Ampicillin resistanttransformants were selected. Plasmid DNA was isolated from severaltransformants, digested with endonucleases BamHI and BglII, and analyzedby agarose gel electrophoresis. Plasmid containing insert of theexpected size was identified and designated pFP707.

Assembly of subsequence P3,P4 was accomplished in the same manner as thesubsequence P1,P2, starting, however, with plasmids pFP701 (digestedwith PstI and BamHI) and pFP693 (digested with PstI and BglII). Plasmidcontaining the P3,P4 subsequence was identified and designated pFP709.

For assembly of the DNA monomer (P1,P2,P3,P4), plasmid pFP707 (P1, P2)was digested with endonucleases PstI and BamHI, and plasmid pFP709(P3,P4) was digested with endonucleases PstI and BglII. Digested plasmidDNA was fractionated by electrophoresis in a 1.2% low melting agarosegel. Ethidium bromide-stained bands containing the P1,P2 and P3,P4sequences, respectively, identified by their relative sizes, wereexcised, the excised bands combined, and the DNA recovered from meltedagarose by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, LaJolla, CA). The eluted combined DNA fragments were incubated underligation conditions and an aliquot was used to transform E. coli HB101.Ampicillin-resistant transformants were selected. Plasmid DNA wasisolated from several transformants, digested with endonucleases BamHIand BglII, and analyzed by agarose gel electrophoresis. Plasmidcontaining an insert of the expected size was identified and designatedpFP711. The DNA insert in plasmid pFP711 was verified by direct DNAsequencing.

c. Polymerization of the Gene

The synthetic gene was extended by sequential doubling, starting withthe monomer sequence in pFP711. For doubling any insert sequence, analiquot of plasmid DNA was digested with endonucleases PstI and BamHI,and a separate aliquot of the same plasmid was digested withendonucleases PstI and BglII. Digests were fractionated byelectrophoresis on low melting agarose (BioRad, Calif.), and ethidiumbromide stained fragments containing insert sequences were identified bytheir relative sizes. The two insert-containing fragments, purified byelectrophoresis and recovered by the GENECLEAN® procedure (Bio101, Inc.,P.O. Box 2284, La Jolla, CA), were combined and incubated under ligationconditions. At the third doubling, the two fragments in the BamHI digestwere not adequately separated, so the eluted band contained bothfragments. In this case a two-fold excess of the BglII-PstI fragment wasused in the ligation. An aliquot of the ligated DNA was used totransform E. coli HB101. Ampicillin resistant transformants wereselected. Plasmid DNA was isolated from several transformants, digestedwith endonucleases BamHI and BglII, and analyzed by agarose gelelectrophoresis. Plasmid containing an insert of the expected size wasidentified.

By this procedure a series of plasmids was constructed containing 2, 4,8, and 16 tandem repeats of the DNA monomer sequence P1,P2,P3,P4,encoding the series of DP-1B.16 analogs. These plasmids were designatedpFP713 (2 repeats), pFP715 (4 repeats), pFP717 (8 repeats), and pFP719(16 repeats), and p723 (16 repeats), respectively.

2. Expression of DP-1 and DP-2 Analog Genes in Pichia pastoris

a. Growth and Assays

For the growth of cultures to assess production levels, 20 ml BMGY (perliter: 13.4 g yeast nitrogen base with ammonium sulfate (Difco), 10 gyeast extract, 20 g peptone, 0.4 mg biotin, 100 ml 1 M potassiumphosphate buffer, pH 6.0, 10 ml glycerol) in a 125 ml baffled Erlenmeyerflask was inoculated at an absorption (A₆₀₀ nm) of approximately 0.1with cells eluted from a YPD agar plate (containing per liter: 10 gyeast extract (Difco), 20 g peptone, 20 g Bacto agar (Difco), 20 gD-glucose), which had been grown 2 days at 30° C. The culture was shakenat 30° C. until the A₆₀₀ nm reached approximately 25 (2 days), at whichtime cells were harvested by centrifugation (5 min at 1500×g).Supernatant was discarded and the cells resuspended in 6 ml BMMY (sameas BMGY, except with 5 ml methanol per liter in place of glycerol). Theculture as shaken at 30° C., and 0.005 ml methanol per ml culture wasadded every 24 h. Samples (1 ml) were taken immediately afterresuspension and at intervals. Cells were immediately recovered bycentrifugation in a microfuge (2 min at 6000×g). Where secretion was tobe assayed, the top 0.7 ml supernatant was removed and frozen in dry ice(“culture supernatant” fraction). The drained cell pellet was frozen indry ice and stored at −70° C.

Cells were lysed by shaking with glass beads. The thawed pellet waswashed with 1 ml cold breaking buffer (50 mM sodium phosphate, pH 7.4, 1mM EDTA, 5% (v/v) glycerol, 1 mM phenyl methane sulfonyl flouride), andresuspended in 0.1 ml of the same buffer. Glass beads (acid washed,425-600 microns; Sigma Chemical Co.) were added until only a meniscuswas visible above the beads, and the tubes subjected to mixing on avortex type mixer for two intervals of 4 min, cooling on ice between.Cell breakage was verified by microscopic examination. After completebreakage, 0.5 ml breaking buffer was added and mixed. Debris and beadswere pelleted in the microfuge (10 min), and 0.5 ml supernatant (solublecell extract) removed. The debris was then extracted twice withadditional 0.5 ml portions of breaking buffer, and the 0.5 mlsupernatants combined with the first extract (“soluble cell extract”fraction). The debris was then extracted three times with 0.5 mlportions of buffer 6.5 G, containing 0.1 M sodium phosphate, 0.01 MTris-HCl, 6M guanidine-HCl, pH 6.5. The combined supernatants comprisedthe “insoluble cell extract” fraction.

For analysis by polyacrylamide gel electrophoresis, extracts werediluted approximately 1000-fold into sample preparation buffer (0.0625 MTris-HCl, pH 6.8, 2% w/v Na-dodecyl sulfate, 0.0025% w/v bromphenolblue, 10% v/v glycerol, 2.5% v/v 2-mercaptoethanol), and incubated in aboiling water bath for 5 min. Aliquots (5-15 μl) were applied to an 8%polyacrylamide gel (Novex) and subjected to electrophoresis until thedye front was less than 1 cm from the bottom of the gel. Protein bandswere transferred electrophoretically to a sheet of nitrocellulose, usingan apparatus manufactured by Idea Scientific, Inc. The buffer fortransfer contained (per liter) 3.03 g Trishydroxymethyl aminomethane,14.4 g glycine, 0.1% w/v SDS, 25% v/v methanol.

The nitrocellulose blot was stained immuno-chemically as follows.Protein binding sites on the sheet were blocked by incubation with“Blotto” (3% nonfat dry milk, 0.05% Tween 20, in Tris-saline (0.1 MTris-HCl, pH 8.0, 0.9% w/v NaCl)) for 30 min at room temperature on arocking platform. The blot was then incubated for 1 h with anti DP-1serum, diluted 1:1000 in “Blotto”, washed with Tris saline, andincubated for 1 h with horseradish peroxidase-conjugated goatanti-rabbit IgG serum (Kierkegaard and Perry Laboratories, Gaithersburg,Md.), diluted 1:1000 in “Blotto”. After again washing with Tris-saline,the blot was exposed to a solution of 18 mg 4-chloro-1-naphthol in 6 mlmethanol, to which had been added 24 ml Tris-saline and 30 μl 30% H₂O₂.

For quantitation of DP-1 antigen levels in various fractions, aliquots(1 μl) of serial dilutions in buffer 6.5 G were spotted ontonitrocellulose, along with various concentrations of a standard solutionof purified DP-1 8-mer (8 repeats of 101 amino acid residues). Thenitrocellulose sheet was then treated as described above for the Westernblot. The concentration of DP-1 antigen in each sample was estimated bymatching the color intensity of one of the standard spots.

b. Production Strains

(1) Vectors

To construct yeast strains for production of DP-1, cloned syntheticDP-1-coding DNA sequences were inserted into plasmid vectors which werederived from the plasmids pHIL-D4 (obtained from Phillips PetroleumCo.), or pPIC9 (obtained from Invitrogen Corp.). The structure ofpHIL-D4 is illustrated in FIG. 17. The plasmid includes a replicationorigin active in E. coli (but not in yeast) and ampicillin and kanamycinresistance markers that are selectable in E. coli. The kanamycinresistance marker also confers resistance to the antibiotic G418 inyeast. The plasmid includes regions homologous to both ends of thePichia pastoris AOX1 gene. The upstream region includes the AOX1promoter, expression from which is inducible by methanol. Sequences tobe expressed are inserted adjacent to the AOX1 promoter. Downstream aresequences encoding the AOX1 polyadenylation site and transcriptionterminator, the kanamycin marker, and the Pichia pastoris HIS4 gene. InpHIL-D4 no translated sequences are provided upstream from the sequencesto be expressed. The vector pPIC9 (FIG. 18) is similar to pHIL-D4,except it includes, adjacent to the AOX1 promoter, sequences encodingthe signal sequence and pro- sequence of the Saccharomyces cerevisiaealpha-mating factor gene. Also, pPIC9 lacks the kanamycin resistancegene of pHIL-D4.

A BamHI site in pPIC9, located immediately upstream of the 5′ end of thealpha-mating factor gene was removed, and the sequences restored tothose resembling the natural AOX1 gene, by polymerase chain reaction(PCR) (Perkin Elmer Cetus, Calif.). Fragments of pPIC9 were amplifiedseparately using the following primer pairs:

LB1:5′-CAACTAATTATTCGAAACGATGAGATTTCC-3′ (SEQ ID NO.:98)

LB6:5′-CTGAGGAACAGTCATGTCTAAGG-3′ (SEQ ID NO.:99)

and

LB2:5′-GGAAATCTCATCGTTTCGAATAATTAGTTG-3′ (SEQ ID NO.:100)

LB5:5′-GAAACGCAAATGGGGAAACAACC-3′ (SEQ ID NO.:101)

PCR reactions were carried out in a Perkin Elmer Cetus DNA ThermalCycler, using the Perkin Elmer Cetus GeneAmp kit with AmpliTaq® DNApolymerase. Instructions provided by the manufacturer were followed. Thetemplate DNA was approximately 0.2 ng pPIC9 DNA digested withendonucleases BglII and PvuII and subsequently recovered by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.).The PCR program included (a) 1 min at 94° C.; (b) 4 cycles consisting of1 min at 94° C., 2 min at 45° C., 1 min at 72° C.; (c) 25 cyclesconsisting of 1 min at 94° C., 1 min at 60° C., 1 min at 72° C.(extended by 10 sec each cycle); and (d) 7 min at 72° C. Products wererecovered from the two separate PCR reactions by the GENECLEAN®procedure (P.O. Box 2284, La Jolla, Calif.) and mixed in approximatelyequimolar amounts. This mixture was used as template for a second roundof PCR using primers LB5 and LB6. For this reaction, the PCR programincluded (a) 1 min at 94° C.; (b) 25 cycles consisting of 1 min at 90°C., 1 min at 60° C., 1 min at 72° C. (extended 10 sec per cycle); and(d) 7 min at 72° C. The PCR product was recovered by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.), then digestedwith endonucleases NsiI and EcoRI and again recovered by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). The fragmentwas purified by electrophoresis in 1.5% low melting agarose (BioRad).DNA was recovered from the excised gel band by the GENECLEAN® procedure(Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). This fragment wassubstituted for the analogous fragment in pPIC9. For this purpose, pPIC9was digested with endonucleases NsiI and EcoRI. The larger fragment waspurified by electrophoresis in a 1.2% low melting agarose gel andrecovered from the excised gel band by the GENECLEAN® procedure (Bio101,Inc., P.O. Box 2284, La Jolla, Calif.). The PCR fragment and the largepPIC9 fragment were ligated under standard conditions, and the ligationwas used to transform E. coli HB101. Ampicillin resistant transformantscontaining the correct plasmid were identified by screening plasmid DNAfor the absence of the BamHI site. The correct plasmid was designatedpFP734. The DNA sequence of pFP734 in the affected region, verified byDNA sequencing is shown in FIG. 19 (SEQ ID NOs.:96 and 97).

DNA sequences encoding six consecutive histidine residues were insertedinto pHIL-D4. Such sequences were carried on a synthetic double strandedoligonucleotide (SF47/48) with the following sequence:

SEQ ID NO.:102           M  G  S  H  H  H  H  H  H End SEQ ID NO.:1035′HO-AATTATGGGATCCCATCACCATCACCATCACT SEQ ID NO.:104         TACCCTAGGGTAGTGGTAGTGGTAGTGATTAA-OH 5′

The amino acid sequence encoded by this oligonucleotide when it isinserted in the correct orientation into the EcoRI site of pHIL-D4 isshown in one-letter code above the DNA sequence. DNA of pHILD4 wasdigested with endonuclease EcoRI and recovered by the GENECLEAN®procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.). An aliquot ofthis digested DNA (approximately 0.02 pmoles) was mixed witholigonucleotide SF47/48 (10 pmoles), the 5′ termini of which had notbeen phosphorylated. After incubation under ligation conditions for 19 hat 4° C., an aliquot was used to transform E. coli HB101. Transformantswere selected for ampicillin resistance and plasmid DNA of individualtransformants was analyzed following digestion with endonucleases PvuIIand BamHI. A correct plasmid was identified by the presence in thedigest of a DNA band indicative of the BamHI site at thepromoter-proximal end of the oligonucleotide sequence, resulting frominsertion in the desired orientation. This plasmid was designatedpFP684. Correct insertion of the oligonucleotide was verified by directDNA sequencing.

The plasmid vector pFP743 was constructed in an analogous manner, bysubstituting for sequences between NotI and EcoRI sites in pFP734 asynthetic double stranded oligonucleotide (SF55/56) with the followingsequence:

SEQ ID NO.:105         F  G  S  Q  G  A End SEQ ID NO.:1065′HO-AATTCGGATCCCAGGGTGCTTAA SEQ ID NO.:107         GCCTAGGGTCCCACGAATTCCGG-OH 5′

DNA of pFP734 was digested with endonucleases NotI and EcoRI, thenrecovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, LaJolla, Calif.). Oligonucleotide SF55/56 was inserted by ligation asdescribed above. A correct plasmid was identified by the presence of anew fragment upon digesting plasmid DNA with endonucleases BamHI andBglII, and designated pFP743. Correct oligonucleotide insertion wasverified by direct DNA sequencing.

(2) DP-1B.33 Strains

Next, sequences encoding DP-1B were inserted into pFP684 and pFP743 atthe respective unique BamHI sites located between the AOX1 promoter andsequences encoding the His6 oligomer. DNA (approximately 2 micrograms)of plasmids pFP717 (encoding 8 repeats of 101 aa DP-1B) and pFP719(encoding 16 repeats of 101 aa DP-1B) were digested with endonucleaseBamHI and BglII. The digests were fractionated by electrophoresis inlow-melting agarose, and the ethidium bromide-stained band carrying theDP-1B-encoding sequences was identified by size and excised. The excisedgel bands were melted, and to each was added an aliquot of pFP684 orpFP743 DNA that had been digested with endonuclease BamHI. DNA wasrecovered by the GENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, LaJolla, Calif.) and incubated under ligation conditions for 3 h at 13° C.An aliquot of ligated DNA was used to transform E. coli HB101, andtransformants were selected for resistance to ampicillin.

Individual transformants were screened by digesting plasmid DNA withendonucleases BamHI and BglII. Correct plasmids were identified by thepresence of a fragment of the expected size containing the DP-1B.33gene. Plasmids derived from the vector pFP684 were designated pFP728(encoding 8 repeats of 101 amino acids DP-1B) and pFP732 (encoding 16repeats of 101 amino acids DP-1B). Those derived from the vector pFP743were designated pFP748 (encoding 8 repeats of 101 amino acids DP-1B) andpFP752 (encoding 16 repeats of 101 amino acids DP-1B).

Each of these plasmids was used to transfer the DP-1B gene to Pichiapastoris strain GS115 (his4) by spheroplast transformation essentiallyaccording to Cregg et al. (Mol. Cell. Biol. 5, 3376-3385 (1985)). ThePichia strain was grown in 200 ml YPD medium in a 500 ml baffled flaskat 30° C. to A_(600nm) of 0.3 to 0.4. Cells were harvested bycentrifugation at 1500×g for 5 min at room temperature, then washed with20 ml sterile water, followed by 20 ml fresh SED (1 M sorbitol, 25 mMEDTA, pH 8.0, 50 mM DTT), and 20 ml 1 M sorbitol. Cells were resuspendedin 20 ml SCE (1 M sorbitol, 1 mM EDTA, 10 mM sodium citrate, pH 5.8),and zymolyase (15 ml stock solution containing 3 mg/ml Yeast LyticEnzyme from Arthrobacter luteus (ICN Corp.; specific activity 100,000u/g)) was added. Spheroplasting was monitored by diluting 0.2 mlaliquots into 0.8 ml 5% SDS and measuring A_(600nm). Digestion wascontinued until 70-80% spheroplasting was obtained. Spheroplasts werethen harvested by centrifugation at 750×g for 10 min at roomtemperature, washed once with 10 ml 1 M sorbitol and once with 10 ml CAS(1 M sorbitol, 10 mM Tris-HCl, pH 7.5, 10 mM CaCl₂), and finallyresuspended in 0.6 ml CAS. To 0.1 ml spheroplast suspension was added1-5 micrograms linear DNA fragments in CAS, prepared by digestingplasmid DNA with endonuclease BglII and recovering the fragments by theGENECLEAN® procedure (Bio101, Inc., P.O. Box 2284, La Jolla, Calif.).PEG solution (1 ml containing 20% w/v PEG 3350 (Fisher Scientific Co,)in 10 mM Tris-HCl, pH 7.5, 10 mM CaCl₂) was added, mixed gently, andincubated 10 min at room temperature. Spheroplasts were recovered bycentrifugation as above. The drained pellet was resuspended in 0.15 mlSOS (1 M sorbitol, 0.3 vol/vol medium YPD, 10 mM CaCl₂, incubated atroom temperature 20 min, and diluted with 0.85 ml 1 M sorbitol. Washedspheroplasts were mixed with 15 ml RD agarose (containing, per liter:186 g sorbitol, 10 g agarose, 20 g D-glucose, 13.4 g yeast nitrogen basewithout amino acids (Difco), 0.4 mg biotin, 50 mg each L-glutamic acid,L-methionine, L-lysine, L-leucine, L-isoleucine, and 20 ml 50× His assaymedium. The composition of 50× His assay medium was as follows (perliter): 50 g D-glucose, 40 g sodium acetate, 6 g ammonium chloride, 0.4g D,L-alanine, 0.48 g L-arginine-HCl, 0.8 g L-asparagine monohydrate,0.2 g L-aspartic acid, 0.6 g L-glutamic acid, 0.2 g glycine, 0.2 gD,L-phenylalanine, 0.2 g L-proline, 0.1 g D,L-serine, 0.4 gD,L-threonine, 0.5 g D,L-valine, 20 mg adenine sulfate, 20 mg guaninehydrochloride, 20 mg uracil, 20 mg xanthine, 1 mg thiamine-HCl, 0.6 mgpyridoxine-HCl, 0.6 mg pyridoxamine-HCl, 0.6 mg pyridoxal-HCl, 1 mg Capantothenate, 2 mg riboflavin, 2 mg nicotinic acid, 0.2 mgpara-aminobenzoic acid, 0.002 mg biotin, 0.002 mg folic acid, 12 gmonopotassium phosphate, 12 g dipotassium phosphate, 4 g magnesiumsulfate, 20 mg ferrous sulfate, 4 mg manganese sulfate, 20 mg sodiumchloride, 100 mg L-cystine, 80 mg D,L-tryptophane, 200 mg L-tyrosine.Spheroplasts in RD agarose (5 ml aliquots)were plated on RDB plates withthe same composition as RD, but with 20 g agar (Difco) per liter inplace of agarose.

Plates were incubated at 30° C. for 3-4 days. Histidine prototrophictransformants were picked and patched onto MGY plates containing (perliter) 15 g agar, 13.4 g yeast nitrogen base without amino acids, 0.4 mgbiotin, 10 ml glycerol. Replicas were patched onto a sheet of celluloseacetate on the surface of MGY agar. After 2 days growth at 30° C., thecellulose acetate was transferred to a second plate on which a sheet ofnitrocellulose had been placed on the surface of MM agar with the samecomposition as MGY except 0.5% v/v methanol instead of glycerol. Afterincubation for 1-3 days at 30° C., the nitrocellulose sheet was removedfrom under the cellulose acetate, blocked with “Blotto”, and developedby immunochemical staining with anti-DP-1 serum as described above.Positive transformants, identified by blue color in this colonyimmunoassay, were picked from the MGY master plate. Transformants werealso tested for growth on MM agar. DP-1 protein produced by immunoassaypositive strains was assayed by Western blot analysis as describedabove. Several were shown to produce full-length protein of the expectedsize, detected by anti-DP-1 serum.

(2) DP-1B Production

DP-1B production by two such transformants is illustrated in FIGS. 20and 21. FIG. 20 shows intracellular production, after various times ofmethanol induction, by strain YFP5028, which was derived by transformingPichia pastoris GS115 with plasmid pFP728. This strain produces DP-1Bspecies of 5 different sizes, as indicated by Western blot analysis,consisting of 8, 11, 13, 15 and greater than 20 repeats of the 101-aminoacid residue monomer, respectively. It was identified among Pichiatransformants by its ability to grow on YPD medium containing 0.5 mg/mlantibiotic G418, presumably indicative of the presence of multiplecopies of the pFP728-derived insert. Total production of DP-1B was inexcess of 1 g per liter culture. FIG. 21 shows the intracellular andextracellular production of DP-1B by strain YFP5093, which was derivedby transformation of Pichia pastoris GS115 with plasmid pFP748. Asignificant fraction of the DP-LB produced was recovered from theextracellular culture supernatant.

Example 8 Demonstration of the Solutioning and Extrusion of Fibers froma Recombinantly Synthesized Analog to Spider Dragline Protein

For fiber spinning, DP-1B was purified by ion exchange chromatography.Frozen cell paste of E. coli FP3350 was thawed, suspended in 0.02 MTris-HCl buffer, pH 8.0 (Buffer A), and lysed by passage through aMantin-Gaulin homogenizer (3-4 passes). Cell debris was removed bycentrifugation, and the soluble extract was heated to 60° C. for 15-min.Insoluble material was again removed by centrufugation, and the solubleheat-treated extract was adjusted to pH 8.0 and diluted to conductivityless than 0.025M applied to a column of SP-Sepharose Fast Flow(Pharmacia, Piscataway, N.J.) equilibrated with Buffer A. The column waswashed with Buffer A and eluted with a linear gradient from 0 to 0.5 MNaCl in Buffer A. DP-1B-containing fractions were identified by gelelectrophoresis and immunoblotting as described above, pooled, and DP-LBwas recovered by precipitation with 4 volumes of methanol at 0° C. andcentrifugation. Pellets were washed three times with methanol and driedin vacuum. This material was found to be greater than 95% pure DP-1B asdetermined by amino acid analysis.

Briefly, the process of producing useful fibers from purified DP-1protein involves the steps of dissolution in HFIP, followed by spinningof the solution through a spinneret orifice to obtain fibers. Physicalproperties such as tenacity, elongation, and initial modulus weremeasured using methods and instruments which conformed to ASTM StandardD 2101-82, except that the test specimen length was one inch. Fivebreaks per sample were made for each test.

Wet Spinning of Silk Fibers from HFIP Solution:

DP-1 was added to hexafluoroisopropanol (HFIP) in a polyethylene syringeto make a 20% solution of DP-1 in HFIP. The solution was mixedthoroughly, by pumping back and forth between two syringes and allowedto stand overnight.

The 20% solids solution of DP-1 in HFIP was transferred to a syringefitted with a scintered stainless steel DYNALLOG® filter (X7). Thesyringe was capped and periodically vented to disengage air bubblestrapped in the solution. A syringe pump was then used to force thesolution through the filter and out of the syringe through a 5 mildiameter by 4 mil length orifice in a stainless steel spinneret througha 3.5 inch air gap into the container of isopropanol at 20° C. Thefilament which formed as the solution was extruded into the ispropanolat 8.3 fpm and was wound on a bobbin at 11 fpm.

The spun filament was allowed to stand in isopropanol overnight. Then,the filament was drawn while still wet to 2× its length at 150° C. in atube furnace. The drawn fiber was then allowed to dry in room air.

Physical testing of samples of the dry fiber showed them to be 16.7denier, with tenacities of 1.22 gpd, elongations of 103.3%, and initialmoduli of 40.1 gpd. These figures indicate that the tenacity and modulusof the spun DP-1 spider silk variant fiber compares favorably with thoseof commercial textile fibers and is therefore considered to be a usefulfiber.

107 34 amino acids amino acid unknown unknown peptide not provided 1 AlaGly Gln Gly Gly Tyr Gly Gly Leu Gly Xaa Gln Gly Ala Gly Arg 1 5 10 15Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Ala 20 25 30Gly Gly 15 amino acids amino acid unknown unknown peptide not provided 2Ala Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Gly Gly 1 5 10 15 5amino acids amino acid unknown unknown peptide not provided 3 Gly ProGly Gly Tyr 1 5 5 amino acids amino acid unknown unknown peptide notprovided 4 Gly Pro Gly Gln Gln 1 5 14 base pairs nucleic acid singlelinear DNA (genomic) not provided 5 ACGACCTCAT CTAT 14 14 base pairsnucleic acid single linear DNA (genomic) not provided 6 CTGCCTCTGT CATC14 14 base pairs nucleic acid single linear DNA (genomic) not provided 7AATAGGCGTA TCAC 14 19 amino acids amino acid unknown unknown peptide notprovided 8 Gly Arg Gly Ala Gly Gln Ser Gly Leu Gly Gly Tyr Gly Gly GlnGly 1 5 10 15 Ala Gly Cys 19 amino acids amino acid unknown unknownpeptide not provided 9 Cys Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr GlyPro Gly Gln Gln 1 5 10 15 Gly Pro Ser 10 amino acids amino acid unknownunknown peptide not provided 10 Gly Ser His His His His His His Ser Arg1 5 10 27 base pairs nucleic acid single linear DNA (genomic) notprovided 11 GATCCCATCA CCATCACCAT CACTCTA 27 27 base pairs nucleic acidsingle linear DNA (genomic) not provided 12 GATCTAGAGT GATGGTGATGGTGATGG 27 8 amino acids amino acid unknown unknown peptide not provided13 Gly Ser His His His His His His 1 5 27 base pairs nucleic acid singlelinear DNA (genomic) not provided 14 GATCCCATCA CCATCACCAT CACTAAA 27 27base pairs nucleic acid single linear DNA (genomic) not provided 15GATCTTTAGT GATGGTGATG GTGATGG 27 12 base pairs nucleic acid singlelinear DNA (genomic) not provided 16 GATCAGATAT CG 12 12 base pairsnucleic acid single linear DNA (genomic) not provided 17 GATCCGATAT CT12 47 amino acids amino acid unknown unknown peptide not provided 18 GlyPro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly 1 5 10 15Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro 20 25 30Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Ala 35 40 45 651amino acids amino acid unknown unknown protein not provided 19 Gln GlyAla Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly 1 5 10 15 GlyTyr Gly Gly Leu Gly Gly Gln Gly Ala Gly Gln Gly Gly Tyr Gly 20 25 30 GlyLeu Gly Gly Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala 35 40 45 AlaAla Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser 50 55 60 GlnGly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala 65 70 75 80Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly 85 90 95Ala Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala 100 105110 Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Asn 115120 125 Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala Ala Ala Ala Ala Ala Gly130 135 140 Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly AlaGly 145 150 155 160 Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala AlaAla Ala Ala 165 170 175 Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu GlyGly Gln Gly Ala 180 185 190 Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser GlnGly Ala Gly Arg Gly 195 200 205 Gly Leu Gly Gly Gln Gly Ala Gly Ala AlaAla Ala Ala Ala Ala Gly 210 215 220 Gly Ala Gly Gln Gly Gly Leu Gly GlyGln Gly Ala Gly Gln Gly Ala 225 230 235 240 Gly Ala Ser Ala Ala Ala AlaGly Gly Ala Gly Gln Gly Gly Tyr Gly 245 250 255 Gly Leu Gly Ser Gln GlyAla Gly Arg Gly Gly Glu Gly Ala Gly Ala 260 265 270 Ala Ala Ala Ala AlaGly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu 275 280 285 Gly Gly Gln GlyAla Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln 290 295 300 Gly Ala GlyArg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala 305 310 315 320 AlaGly Gly Ala Gly Gln Gly Gly Leu Gly Gly Gln Gly Ala Gly Gln 325 330 335Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly 340 345350 Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Leu Gly Gly 355360 365 Gln Gly Ala Gly Ala Val Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln370 375 380 Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly GlyGln 385 390 395 400 Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala GlyGln Arg Gly 405 410 415 Tyr Gly Gly Leu Gly Asn Gln Gly Ala Gly Arg GlyGly Leu Gly Gly 420 425 430 Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala AlaGly Gly Ala Gly Gln 435 440 445 Gly Gly Tyr Gly Gly Leu Gly Asn Gln GlyAla Gly Arg Gly Gly Gln 450 455 460 Gly Ala Ala Ala Ala Ala Gly Gly AlaGly Gln Gly Gly Tyr Gly Gly 465 470 475 480 Leu Gly Ser Gln Gly Ala GlyArg Gly Gly Gln Gly Ala Gly Ala Ala 485 490 495 Ala Ala Ala Ala Val GlyAla Gly Gln Glu Gly Ile Arg Gly Gln Gly 500 505 510 Ala Gly Gln Gly GlyTyr Gly Gly Leu Gly Ser Gln Gly Ser Gly Arg 515 520 525 Gly Gly Leu GlyGly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly 530 535 540 Gly Ala GlyGln Gly Gly Leu Gly Gly Gln Gly Ala Gly Gln Gly Ala 545 550 555 560 GlyAla Ala Ala Ala Ala Ala Gly Gly Val Arg Gln Gly Gly Tyr Gly 565 570 575Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala 580 585590 Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu 595600 605 Gly Gly Gln Gly Val Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly610 615 620 Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly ValGly 625 630 635 640 Ser Gly Ala Ser Ala Ala Ser Ala Ala Ala Ala 645 650101 amino acids amino acid unknown unknown protein not provided 20 GlyAla Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala 1 5 10 15Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala 20 25 30Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala 35 40 45Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser Gln Gly Ala Gly 50 55 60Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly 65 70 7580 Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly 85 9095 Gly Leu Gly Ser Gln 100 606 amino acids amino acid unknown unknownprotein not provided 21 Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala AlaAla Ala Ala Ala 1 5 10 15 Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly LeuGly Ser Gln Gly Ala 20 25 30 Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala GlyAla Ala Ala Ala Ala 35 40 45 Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu GlySer Gln Gly Ala Gly 50 55 60 Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala GlyGly Ala Gly Gln Gly 65 70 75 80 Gly Tyr Gly Gly Leu Gly Ser Gln Gly AlaGly Gln Gly Gly Tyr Gly 85 90 95 Gly Leu Gly Ser Gln Gly Ala Gly Arg GlyGly Gln Gly Ala Gly Ala 100 105 110 Ala Ala Ala Ala Ala Gly Gly Ala GlyGln Gly Gly Tyr Gly Gly Leu 115 120 125 Gly Ser Gln Gly Ala Gly Arg GlyGly Leu Gly Gly Gln Gly Ala Gly 130 135 140 Ala Ala Ala Ala Ala Ala AlaGly Gly Ala Gly Gln Gly Gly Leu Gly 145 150 155 160 Ser Gln Gly Ala GlyGln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly 165 170 175 Gly Ala Gly GlnGly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly 180 185 190 Gln Gly GlyTyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly 195 200 205 Gln GlyAla Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly 210 215 220 GlyTyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Leu Gly 225 230 235240 Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly 245250 255 Gln Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala260 265 270 Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly LeuGly 275 280 285 Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly SerGln Gly 290 295 300 Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala AlaAla Ala Gly 305 310 315 320 Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu GlySer Gln Gly Ala Gly 325 330 335 Arg Gly Gly Leu Gly Gly Gln Gly Ala GlyAla Ala Ala Ala Ala Ala 340 345 350 Ala Gly Gly Ala Gly Gln Gly Gly LeuGly Ser Gln Gly Ala Gly Gln 355 360 365 Gly Ala Gly Ala Ala Ala Ala AlaAla Gly Gly Ala Gly Gln Gly Gly 370 375 380 Tyr Gly Gly Leu Gly Ser GlnGly Ala Gly Gln Gly Gly Tyr Gly Gly 385 390 395 400 Leu Gly Ser Gln GlyAla Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala 405 410 415 Ala Ala Ala AlaGly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly 420 425 430 Ser Gln GlyAla Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala 435 440 445 Ala AlaAla Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser 450 455 460 GlnGly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly 465 470 475480 Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln 485490 495 Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln500 505 510 Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln GlyGly 515 520 525 Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly LeuGly Gly 530 535 540 Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Ala Gly GlyAla Gly Gln 545 550 555 560 Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln GlyAla Gly Ala Ala Ala 565 570 575 Ala Ala Ala Gly Gly Ala Gly Gln Gly GlyTyr Gly Gly Leu Gly Ser 580 585 590 Gln Gly Ala Gly Gln Gly Gly Tyr GlyGly Leu Gly Ser Gln 595 600 605 101 amino acids amino acid unknownunknown protein not provided 22 Gly Ala Gly Gln Gly Gly Tyr Gly Gly LeuGly Ser Gln Gly Ala Gly 1 5 10 15 Arg Gly Gly Leu Gly Gly Gln Gly AlaGly Ala Ala Ala Ala Ala Ala 20 25 30 Ala Gly Gly Ala Gly Gln Gly Gly LeuGly Ser Gln Gly Ala Gly Gln 35 40 45 Gly Ala Gly Ala Ala Ala Ala Ala AlaGly Gly Ala Gly Gln Gly Gly 50 55 60 Tyr Gly Gly Leu Gly Ser Gln Gly AlaGly Arg Gly Gly Gln Gly Ala 65 70 75 80 Gly Ala Ala Ala Ala Ala Ala GlyGly Ala Gly Gln Gly Gly Tyr Gly 85 90 95 Gly Leu Gly Ser Gln 100 606amino acids amino acid unknown unknown protein not provided 23 Gly AlaGly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly 1 5 10 15 ArgGly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala 20 25 30 AlaGly Gly Ala Gly Gln Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln 35 40 45 GlyAla Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly 50 55 60 TyrGly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala 65 70 75 80Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly 85 90 95Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly 100 105110 Ser Gln Gly Ala Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala 115120 125 Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser130 135 140 Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala GlyGly 145 150 155 160 Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln GlyAla Gly Arg 165 170 175 Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala AlaGly Gly Ala Gly 180 185 190 Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln GlyAla Gly Gln Gly Gly 195 200 205 Tyr Gly Gly Leu Gly Ser Gln Gly Ala GlyArg Gly Gly Leu Gly Gly 210 215 220 Gln Gly Ala Gly Ala Ala Ala Ala AlaAla Ala Gly Gly Ala Gly Gln 225 230 235 240 Gly Gly Leu Gly Ser Gln GlyAla Gly Gln Gly Ala Gly Ala Ala Ala 245 250 255 Ala Ala Ala Gly Gly AlaGly Gln Gly Gly Tyr Gly Gly Leu Gly Ser 260 265 270 Gln Gly Ala Gly ArgGly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala 275 280 285 Ala Gly Gly AlaGly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly 290 295 300 Ala Gly GlnGly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg 305 310 315 320 GlyGly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Ala 325 330 335Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly 340 345350 Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr 355360 365 Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly370 375 380 Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr GlyGly 385 390 395 400 Leu Gly Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly GlyLeu Gly Ser 405 410 415 Gln Gly Ala Gly Arg Gly Gly Leu Gly Gly Gln GlyAla Gly Ala Ala 420 425 430 Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln GlyGly Leu Gly Ser Gln 435 440 445 Gly Ala Gly Gln Gly Ala Gly Ala Ala AlaAla Ala Ala Gly Gly Ala 450 455 460 Gly Gln Gly Gly Tyr Gly Gly Leu GlySer Gln Gly Ala Gly Arg Gly 465 470 475 480 Gly Gln Gly Ala Gly Ala AlaAla Ala Ala Ala Gly Gly Ala Gly Gln 485 490 495 Gly Gly Tyr Gly Gly LeuGly Ser Gln Gly Ala Gly Gln Gly Gly Tyr 500 505 510 Gly Gly Leu Gly SerGln Gly Ala Gly Arg Gly Gly Leu Gly Gly Gln 515 520 525 Gly Ala Gly AlaAla Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly 530 535 540 Gly Leu GlySer Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala 545 550 555 560 AlaAla Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln 565 570 575Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala 580 585590 Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln 595 600 60593 base pairs nucleic acid single linear DNA (genomic) not provided 24GGGCCGGACG TGGTGGCCTT GGTGGTCAGG GTGCTGGCGC GGCAGCCGCT GCGGCAGCTG 60GTGGTGCTGG TCAGGGCGGT CTTGGCTCAC AAG 93 93 base pairs nucleic acidsingle linear DNA (genomic) not provided 25 GTGAGCCAAG ACCGCCCTGACCAGCACCAC CAGCTGCCGC AGCGGCTGCC GCGCCAGCAC 60 CCTGACCACC AAGGCCACCACGTCCGGCCC CTT 93 31 amino acids amino acid unknown unknown peptide notprovided 26 Gly Ala Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala AlaAla 1 5 10 15 Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu Gly SerGln 20 25 30 81 base pairs nucleic acid single linear DNA (genomic) notprovided 27 GGGCCGGTCA AGGCGCTGGT GCAGCAGCAG CTGCCGCTGG CGGTGCAGGCCAAGGTGGAT 60 ATGGTGGCTT AGGGTCACAA G 81 81 base pairs nucleic acidsingle linear DNA (genomic) not provided 28 GTGACCCTAA GCCACCATATCCACCTTGGC CTGCACCGCC AGCGGCAGCT GCTGCTGCAC 60 CAGCGCCTTG ACCGGCCCCT T81 27 amino acids amino acid unknown unknown peptide not provided 29 GlyAla Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala 1 5 10 15Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln 20 25 90 base pairs nucleicacid single linear DNA (genomic) not provided 30 GGGCCGGTCG AGGTGGACAAGGTGCAGGTG CAGCCGCTGC TGCTGCGGGC GGCGCAGGTC 60 AAGGTGGGTA TGGGGGTTTAGGTTCACAAG 90 90 base pairs nucleic acid single linear DNA (genomic) notprovided 31 GTGAACCTAA ACCCCCATAC CCACCTTGAC CTGCGCCGCC CGCAGCAGCAGCGGCTGCAC 60 CTGCACCTTG TCCACCTCGA CCGGCCCCTT 90 30 amino acids aminoacid unknown unknown peptide not provided 32 Gly Ala Gly Arg Gly Gly GlnGly Ala Gly Ala Ala Ala Ala Ala Ala 1 5 10 15 Gly Gly Ala Gly Gln GlyGly Tyr Gly Gly Leu Gly Ser Gln 20 25 30 39 base pairs nucleic acidsingle linear DNA (genomic) not provided 33 GGGCCGGGCA AGGTGGTTACGGCGGTCTCG GATCACAAG 39 39 base pairs nucleic acid single linear DNA(genomic) not provided 34 GTGATCCGAG ACCGCCGTAA CCACCTTGCC CGGCCCCTT 3913 amino acids amino acid unknown unknown peptide not provided 35 GlyAla Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln 1 5 10 32 base pairsnucleic acid single linear DNA (genomic) not provided 36 GATCTGCGGCCCAAGGGGCC CACAAGGTGA GG 32 32 base pairs nucleic acid single linear DNA(genomic) not provided 37 ACGCCGGGTT CCCCGGGTGT TCCACTCCCT AG 32 9 aminoacids amino acid unknown unknown peptide not provided 38 Ser Ala Ala GlnGly Ala His Lys Val 1 5 42 base pairs nucleic acid single linear DNA(genomic) not provided 39 GGATCCCATC ACCATCACCA TCACTCTAGA TCCGGCTGCT AA42 13 amino acids amino acid unknown unknown peptide not provided 40 GlySer His His His His His His Ser Arg Ser Gly Cys 1 5 10 66 base pairsnucleic acid single linear DNA (genomic) not provided 41 GATCTCCCGGGCCATCCGGC CCAGGTTCTG CGGCAGCGGC AGCAGCGGGC CCAGGGCAGC 60 AGCTGG 66 66base pairs nucleic acid single linear DNA (genomic) not provided 42GATCCCAGCT GCTGCCCTGG GCCCGCTGCT GCCGCTGCCG CAGAACCTGG GCCGGATGGC 60CCGGGA 66 21 amino acids amino acid unknown unknown peptide not provided43 Ser Pro Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Gly 1 510 15 Pro Gly Gln Gln Leu 20 72 base pairs nucleic acid single linearDNA (genomic) not provided 44 GATCTCCCGG GCCGGGCGGT TACGGTCCGGGTCAGCAAGG CCCAGGTGGC TACGGCCCAG 60 GCCAACAGCT GG 72 72 base pairsnucleic acid single linear DNA (genomic) not provided 45 GATCCCAGCTGTTGGCCTGG GCCGTAGCCA CCTGGGCCTT GCTGACCCGG ACCGTAACCG 60 CCCGGCCCGG GA72 23 amino acids amino acid unknown unknown peptide not provided 46 SerPro Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly 1 5 10 15Tyr Gly Pro Gly Gln Gln Leu 20 72 base pairs nucleic acid single linearDNA (genomic) not provided 47 GATCTCCCGG GCCATCTGGT CCGGGTAGCGCTGCGGCTGC TGCTGCTGCG GCAGGTCCAG 60 GCGGCTACGT AG 72 72 base pairsnucleic acid single linear DNA (genomic) not provided 48 GATCCTACGTAGCCGCCTGG ACCTGCCGCA GCAGCAGCAG CCGCAGCGCT ACCCGGACCA 60 GATGGCCCGG GA72 23 amino acids amino acid unknown unknown peptide not provided 49 SerPro Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala 1 5 10 15Ala Gly Pro Gly Gly Tyr Val 20 57 base pairs nucleic acid single linearDNA (genomic) not provided 50 GATCTCCCGG GCCGGGCCAA CAAGGTCCGGGCGGCTATGG TCCAGGTCAA CAGCTGG 57 57 base pairs nucleic acid singlelinear DNA (genomic) not provided 51 GATCCCAGCT GTTGACCTGG ACCATAGCCGCCCGGACCTT GTTGGCCCGG CCCGGGA 57 18 amino acids amino acid unknownunknown peptide not provided 52 Ser Pro Gly Pro Gly Gln Gln Gly Pro GlyGly Tyr Gly Pro Gly Gln 1 5 10 15 Gln Leu 75 base pairs nucleic acidsingle linear DNA (genomic) not provided 53 GATCTCCCGG GCCGAGCGGTCCAGGTTCCG CAGCAGCAGC GGCTGCGGCG GCAGCGGGTC 60 CAGGTGGTTA CGTAG 75 75base pairs nucleic acid single linear DNA (genomic) not provided 54GATCCTACGT AACCACCTGG ACCCGCTGCC GCCGCAGCCG CTGCTGCTGC GGAACCTGGA 60CCGCTCGGCC CGGGA 75 24 amino acids amino acid unknown unknown peptidenot provided 55 Ser Pro Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala AlaAla Ala 1 5 10 15 Ala Ala Gly Pro Gly Gly Tyr Val 20 87 base pairsnucleic acid single linear DNA (genomic) not provided 56 GATCTCCCGGGCCAGGCCAG CAGGGTCCGG GTGGCTATGG CCCAGGCCAG CAAGGTCCGG 60 GTGGTTACGGTCCAGGTCAG CAGCTGG 87 87 base pairs nucleic acid single linear DNA(genomic) not provided 57 GATCCCAGCT GCTGACCTGG ACCGTAACCA CCCGGACCTTGCTGGCCTGG GCCATAGCCA 60 CCCGGACCCT GCTGGCCTGG CCCGGGA 87 28 amino acidsamino acid unknown unknown peptide not provided 58 Ser Pro Gly Pro GlyGln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln 1 5 10 15 Gln Gly Pro GlyGly Tyr Gly Pro Gly Gln Gln Leu 20 25 493 amino acids amino acid unknownunknown protein not provided 59 Gly Pro Gly Gly Tyr Gly Pro Gly Gln GlnGly Pro Gly Gly Tyr Gly 1 5 10 15 Pro Gly Gln Gln Gly Pro Gly Arg TyrGly Pro Gly Gln Gln Gly Pro 20 25 30 Ser Gly Pro Gly Ser Ala Ala Ala AlaAla Ala Gly Ser Gly Gln Gln 35 40 45 Gly Pro Gly Gly Tyr Gly Pro Arg GlnGln Gly Pro Gly Gly Tyr Gly 50 55 60 Gln Gly Gln Gln Gly Pro Ser Gly ProGly Ser Ala Ala Ala Ala Ser 65 70 75 80 Ala Ala Ala Ser Ala Glu Ser GlyGly Pro Gly Gly Tyr Gly Pro Gly 85 90 95 Gln Gln Gly Pro Gly Gly Tyr GlyPro Gly Gln Gln Gly Pro Gly Gly 100 105 110 Tyr Gly Pro Gly Gln Gln GlyPro Ser Gly Pro Gly Ser Ala Ala Ala 115 120 125 Ala Ala Ala Ala Ala SerGly Pro Gly Gln Gln Gly Pro Gly Gly Tyr 130 135 140 Gly Pro Gly Gln GlnGly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly 145 150 155 160 Pro Ser GlyPro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ser Gly 165 170 175 Pro GlyGln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro 180 185 190 GlyGly Tyr Gly Pro Gly Gln Gln Gly Thr Ser Gly Pro Gly Ser Ala 195 200 205Ala Ala Ala Ala Ala Ala Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr 210 215220 Gly Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala 225230 235 240 Ala Ala Ala Ala Ala Gly Pro Gly Gly Tyr Gly Pro Gly Gln GlnGly 245 250 255 Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Ser Gly ProGly Ser 260 265 270 Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly Gln Gln GlyLeu Gly Gly 275 280 285 Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr GlyPro Gly Gln Gln 290 295 300 Gly Pro Gly Gly Tyr Gly Pro Gly Ser Ala SerAla Ala Ala Ala Ala 305 310 315 320 Ala Gly Pro Gly Gln Gln Gly Pro GlyGly Tyr Gly Pro Gly Gln Gln 325 330 335 Gly Pro Ser Gly Pro Gly Ser AlaSer Ala Ala Ala Ala Ala Ala Ala 340 345 350 Ala Gly Pro Gly Gly Tyr GlyPro Gly Gln Gln Gly Pro Gly Gly Tyr 355 360 365 Ala Pro Gly Gln Gln GlyPro Ser Gly Pro Gly Ser Ala Ser Ala Ala 370 375 380 Ala Ala Ala Ala AlaAla Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln 385 390 395 400 Gly Pro GlyGly Tyr Ala Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly 405 410 415 Ser AlaAla Ala Ala Ala Ala Ala Ser Ala Gly Pro Gly Gly Tyr Gly 420 425 430 ProAla Gln Gln Gly Pro Ser Gly Pro Gly Ile Ala Ala Ser Ala Ala 435 440 445Ser Ala Gly Pro Gly Gly Tyr Gly Pro Ala Gln Gln Gly Pro Ala Gly 450 455460 Tyr Gly Pro Gly Ser Ala Val Ala Ala Ser Ala Gly Ala Gly Ser Ala 465470 475 480 Gly Tyr Gly Pro Gly Ser Gln Ala Ser Ala Ala Ala Ser 485 490119 amino acids amino acid unknown unknown peptide not provided 60 GlyPro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Gly Pro Gly 1 5 10 15Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly 20 25 30Tyr Gly Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala 35 40 45Ala Ala Ala Ala Ala Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly 50 55 60Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly Ser 65 70 7580 Ala Ala Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly Gly Tyr Gly Pro 85 9095 Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly 100105 110 Gly Tyr Gly Pro Gly Gln Gln 115 714 amino acids amino acidunknown unknown protein not provided 61 Gly Pro Ser Gly Pro Gly Ser AlaAla Ala Ala Ala Ala Gly Pro Gly 1 5 10 15 Gln Gln Gly Pro Gly Gly TyrGly Pro Gly Gln Gln Gly Pro Gly Gly 20 25 30 Tyr Gly Pro Gly Gln Gln GlyPro Ser Gly Pro Gly Ser Ala Ala Ala 35 40 45 Ala Ala Ala Ala Ala Gly ProGly Gly Tyr Gly Pro Gly Gln Gln Gly 50 55 60 Pro Gly Gly Tyr Gly Pro GlyGln Gln Gly Pro Ser Gly Pro Gly Ser 65 70 75 80 Ala Ala Ala Ala Ala AlaAla Ala Ala Gly Pro Gly Gly Tyr Gly Pro 85 90 95 Gly Gln Gln Gly Pro GlyGly Tyr Gly Pro Gly Gln Gln Gly Pro Gly 100 105 110 Gly Tyr Gly Pro GlyGln Gln Gly Pro Ser Gly Pro Gly Ser Ala Ala 115 120 125 Ala Ala Ala AlaGly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro 130 135 140 Gly Gln GlnGly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Ser 145 150 155 160 GlyPro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly Gly 165 170 175Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln 180 185190 Gly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala 195200 205 Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly210 215 220 Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln GlyPro 225 230 235 240 Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Gly ProGly Gln Gln 245 250 255 Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly ProGly Gly Tyr Gly 260 265 270 Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly SerAla Ala Ala Ala Ala 275 280 285 Ala Ala Ala Gly Pro Gly Gly Tyr Gly ProGly Gln Gln Gly Pro Gly 290 295 300 Gly Tyr Gly Pro Gly Gln Gln Gly ProSer Gly Pro Gly Ser Ala Ala 305 310 315 320 Ala Ala Ala Ala Ala Ala AlaGly Pro Gly Gly Tyr Gly Pro Gly Gln 325 330 335 Gln Gly Pro Gly Gly TyrGly Pro Gly Gln Gln Gly Pro Gly Gly Tyr 340 345 350 Gly Pro Gly Gln GlnGly Pro Ser Gly Pro Gly Ser Ala Ala Ala Ala 355 360 365 Ala Ala Gly ProGly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln 370 375 380 Gln Gly ProGly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Ser Gly Pro 385 390 395 400 GlySer Ala Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly Gly Tyr Gly 405 410 415Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro 420 425430 Ser Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Gly Pro 435440 445 Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly450 455 460 Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro SerGly 465 470 475 480 Pro Gly Ser Ala Ala Ala Ala Ala Ala Gly Pro Gly GlnGln Gly Pro 485 490 495 Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Gly GlyTyr Gly Pro Gly 500 505 510 Gln Gln Gly Pro Ser Gly Pro Gly Ser Ala AlaAla Ala Ala Ala Ala 515 520 525 Ala Gly Pro Gly Gly Tyr Gly Pro Gly GlnGln Gly Pro Gly Gly Tyr 530 535 540 Gly Pro Gly Gln Gln Gly Pro Ser GlyPro Gly Ser Ala Ala Ala Ala 545 550 555 560 Ala Ala Ala Ala Ala Gly ProGly Gly Tyr Gly Pro Gly Gln Gln Gly 565 570 575 Pro Gly Gly Tyr Gly ProGly Gln Gln Gly Pro Gly Gly Tyr Gly Pro 580 585 590 Gly Gln Gln Gly ProSer Gly Pro Gly Ser Ala Ala Ala Ala Ala Ala 595 600 605 Gly Pro Gly GlnGln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly 610 615 620 Pro Gly GlyTyr Gly Pro Gly Gln Gln Gly Pro Ser Gly Pro Gly Ser 625 630 635 640 AlaAla Ala Ala Ala Ala Ala Ala Gly Pro Gly Gly Tyr Gly Pro Gly 645 650 655Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln Gly Pro Ser Gly 660 665670 Pro Gly Ser Ala Ala Ala Ala Ala Ala Ala Ala Ala Gly Pro Gly Gly 675680 685 Tyr Gly Pro Gly Gln Gln Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln690 695 700 Gly Pro Gly Gly Tyr Gly Pro Gly Gln Gln 705 710 101 aminoacids amino acid unknown unknown peptide not provided 62 Ser Gln Gly AlaGly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly 1 5 10 15 Ala Gly ArgGly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala 20 25 30 Ala Ala AlaGly Gly Ala Gly Gln Gly Gly Leu Gly Ser Gln Gly Ala 35 40 45 Gly Gln GlyAla Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln 50 55 60 Gly Gly TyrGly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln 65 70 75 80 Gly AlaGly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly 85 90 95 Tyr GlyGly Leu Gly 100 604 amino acids amino acid unknown unknown protein notprovided 63 Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser GlnGly 1 5 10 15 Ala Gly Arg Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala AlaAla Ala 20 25 30 Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser GlnGly Ala 35 40 45 Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly AlaGly Gln 50 55 60 Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg GlyGly Gln 65 70 75 80 Gly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala GlyGln Gly Gly 85 90 95 Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly GlyTyr Gly Gly 100 105 110 Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Leu GlyGly Gln Gly Ala 115 120 125 Gly Ala Ala Ala Ala Ala Ala Ala Gly Gly AlaGly Gln Gly Gly Leu 130 135 140 Gly Ser Gln Gly Ala Gly Gln Gly Ala GlyAla Ala Ala Ala Ala Ala 145 150 155 160 Gly Gly Ala Gly Gln Gly Gly TyrGly Gly Leu Gly Ser Gln Gly Ala 165 170 175 Gly Arg Gly Gly Gln Gly AlaGly Ala Ala Ala Ala Ala Ala Gly Gly 180 185 190 Ala Gly Gln Gly Gly TyrGly Gly Leu Gly Ser Gln Gly Ala Gly Gln 195 200 205 Gly Gly Tyr Gly GlyLeu Gly Ser Gln Gly Ala Gly Arg Gly Gly Leu 210 215 220 Gly Gly Gln GlyAla Gly Ala Ala Ala Ala Ala Ala Ala Gly Gly Ala 225 230 235 240 Gly GlnGly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Ala Gly Ala 245 250 255 AlaAla Ala Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu 260 265 270Gly Ser Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala 275 280285 Ala Ala Ala Gly Gly Ala Gly Gly Gly Tyr Gly Gly Leu Gly Ser Gly 290295 300 Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg305 310 315 320 Gly Gly Leu Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala AlaAla Ala 325 330 335 Gly Gly Ala Gly Gln Gly Gly Leu Gly Ser Gln Gly AlaGly Gln Gly 340 345 350 Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala GlyGln Gly Gly Tyr 355 360 365 Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg GlyGly Gln Gly Ala Gly 370 375 380 Ala Ala Ala Ala Ala Ala Gly Gly Ala GlyGln Gly Gly Tyr Gly Gly 385 390 395 400 Leu Gly Ser Gln Gly Ala Gly GlnGly Gly Tyr Gly Gly Leu Gly Ser 405 410 415 Gln Gly Ala Gly Arg Gly GlyLeu Gly Gly Gln Gly Ala Gly Ala Ala 420 425 430 Ala Ala Ala Ala Ala GlyGly Ala Gly Gln Gly Gly Leu Gly Ser Gln 435 440 445 Gly Ala Gly Gln GlyAla Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala 450 455 460 Gly Gln Gly GlyTyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly 465 470 475 480 Gly GlnGly Ala Gly Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln 485 490 495 GlyGly Tyr Gly Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Gly Tyr 500 505 510Gly Gly Leu Gly Ser Gln Gly Ala Gly Arg Gly Gly Leu Gly Gly Gln 515 520525 Gly Ala Gly Ala Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly Gln Gly 530535 540 Gly Leu Gly Ser Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala545 550 555 560 Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu GlySer Gln 565 570 575 Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala AlaAla Ala Ala 580 585 590 Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly595 600 39 base pairs nucleic acid single linear DNA (genomic) notprovided 64 GATCTCAGGG TGCTGGCCAG GGTGGCTATG GTGGCCTGG 39 39 base pairsnucleic acid single linear DNA (genomic) not provided 65 GATCCCAGGCCACCATAGCC ACCCTGGCCA GCACCCTGA 39 13 amino acids amino acid unknownunknown peptide not provided 66 Ser Gln Gly Ala Gly Gln Gly Gly Tyr GlyGly Leu Gly 1 5 10 93 base pairs nucleic acid single linear DNA(genomic) not provided 67 GATCTCAAGG CGCTGGTCGC GGTGGCCTGG GTGGCCAGGGTGCAGGTGCT GCTGCTGCTG 60 CGGCTGCTGG TGGTGCAGGT CAGGGTGGTC TGG 93 93 basepairs nucleic acid single linear DNA (genomic) not provided 68GATCCCAGAC CACCCTGACC TGCACCACCA GCAGCCGCAG CAGCAGCAGC ACCTGCACCC 60TGGCCACCCA GGCCACCGCG ACCAGCGCCT TGA 93 31 amino acids amino acidunknown unknown peptide not provided 69 Ser Gln Gly Ala Gly Arg Gly GlyLeu Gly Gly Gln Gly Ala Gly Ala 1 5 10 15 Ala Ala Ala Ala Ala Ala GlyGly Ala Gly Gln Gly Gly Leu Gly 20 25 30 81 base pairs nucleic acidsingle linear DNA (genomic) not provided 70 GATCTCAGGG CGCAGGTCAAGGTGCTGGTG CAGCTGCGGC GGCAGCTGGT GGCGCGGGTC 60 AAGGTGGCTA CGGCGGTTTA G81 81 base pairs nucleic acid single linear DNA (genomic) not provided71 GATCCTAAAC CGCCGTAGCC ACCTTGACCC GCGCCACCAG CTGCCGCCGC AGCTGCACCA 60GCACCTTGAC CTGCGCCCTG A 81 28 amino acids amino acid unknown unknownpeptide not provided 72 Ser Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala AlaAla Ala Ala Gly 1 5 10 15 Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly LeuGly 20 25 90 base pairs nucleic acid single linear DNA (genomic) notprovided 73 GATCTCAAGG TGCGGGTCGC GGTGGTCAGG GCGCTGGTGC AGCAGCGGCAGCAGCAGGTG 60 GCGCTGGCCA AGGTGGTTAC GGTGGTCTTG 90 90 base pairs nucleicacid single linear DNA (genomic) not provided 74 GATCCAAGAC CACCGTAACCACCTTGGCCA GCGCCACCTG CTGCTGCCGC TGCTGCACCA 60 GCGCCCTGAC CACCGCGACCCGCACCTTGA 90 30 amino acids amino acid unknown unknown peptide notprovided 75 Ser Gln Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala AlaAla 1 5 10 15 Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly 2025 30 18 base pairs nucleic acid single linear DNA (genomic) notprovided 76 AATTCAGATC TAAGCTTG 18 18 base pairs nucleic acid singlelinear DNA (genomic) not provided 77 GATCCAAGCT TAGATCTG 18 4909 basepairs nucleic acid single circular DNA (genomic) not provided 78GAATTCCGGG GGATTATGCG TTAAGCATAA AGTGTAAAGC CTGGGGTGCC TAATGAGTGA 60GCTAACTCAC ATTAATTGCG TTGCGCTCAC TGCCCGCTTT CCAGTCGGGA AACCTGTCGT 120GCCAGCTGCA TTAATGAATC GGCCAACGCG CGGGGAGAGG CGGTTTGCGT ATTGGGCGCC 180AGGGTGGTTT TTCTTTTCAC CAGTGAGACG GGCAACAGCT GATTGCCCTT CACCGCCTGG 240CCCTGAGAGA GTTGCAGCAA GCGGTCCACG CTGGTTTGCC CCAGCAGGCG AAAATCCTGT 300TTGATGGTGG TTGACGGCGG GATATAACAT GAGCTGTCTT CGGTATCGTC GTATCCCACT 360ACCGAGATAT CCGCACCAAC GCGCAGCCCG GACTCGGTAA TGGCGCGCAT TGCGCCCAGC 420GCCATCTGAT CGTTGGCAAC CAGCATCGCA GTGGGAACGA TGCCCTCATT CAGCATTTGC 480ATGGTTTGTT GAAAACCGGA CATGGCACTC CAGTCGCCTT CCCGTTCCGC TATCGGCTGA 540ATTTGATTGC GAGTGAGATA TTTATGCCAG CCAGCCAGAC GCAGACGCGC CGAGACAGAA 600CTTAATGGGC CCGCTAACAG CGCGATTTGC TGGTGACCCA ATGCGACCAG ATGCTCCACG 660CCCAGTCGCG TACCGTCTTC ATGGGAGAAA ATAATACTGT TGATGGGTGT CTGGTCAGAG 720ACATCAAGAA ATAACGCCGG AACATTAGTG CAGGCAGCTT CCACAGCAAT GGCATCCTGG 780TCATCCAGCG GATAGTTAAT GATCAGCCCA CTGACGCGTT GCGCGAGAAG ATTGTGCACC 840GCCGCTTTAC AGGCTTCGAC GCCGCTTCGT TCTACCATCG ACACCACCAC GCTGGCACCC 900AGTTGATCGG CGCGAGATTT AATCGCCGCG ACAATTTGCG ACGGCGCGTG CAGGGCCAGA 960CTGGAGGTGG CAACGCCAAT CAGCAACGAC TGTTTGCCCG CCAGTTGTTG TGCCACGCGG 1020TTGGGAATGT AATTCAGCTC CGCCATCGCC GCTTCCACTT TTTCCCGCGT TTTCGCAGAA 1080ACGTGGCTGG CCTGGTTCAC CACGCGGGAA ACGGTCTGAT AAGAGACACC GGCATACTCT 1140GCGACATCGT ATAACGTTAC TGGTTTCACA TTCACCACCC TGAATTGACT CTCTTCCGGG 1200CGCTATCATG CCATACCGCG AAAGGTTTTG CGCCATTCGA TGGTGTCAAC CTTGCAGAGC 1260TGCGCCTTTA TTATTATCCG CCGGGAGAAA ATATTCCGTG GATCTAACGG GATGCGTTAT 1320GTTGAAGTGA GACCGGTCGA CGCATGCCAG GACAACTTCT GGTCCGGTAA CGTGCTGAGC 1380CCGGCCAAGC TTACTCCCCA TCCCCCTGTT GACAATTAAT CATCGGCTCG TATAATGTGT 1440GGAATTGTGA GCGGATAACA ATTTCACACA GGAAACAGGA TCACTAAGGA GGTTTAAATA 1500TGGCTACTGT TATAGATCCG TCTGTCGCGA CGGCCGTTTC GTCGAATGGC TCGGTTGCCA 1560ATATCAATGC GATCAAGTCG GGCGCTCTGG AGTCCGGCTT TACGCAGTCA GACGTTGCCT 1620ATTGGGCCTA TAACGGCACC GGCCTTTATG ATGGCAAGGG CAAGGTGGAA GATTTGCGCC 1680TTCTGGCGAC GCTTTACCCG GAAACGATCC ATATCGTTGC GCGTAAGGAT GCAAACATCA 1740AATCGGTCGC AGACCTGAAA GGCAAGCGCG TTTCGCTGGA TGAGCCGGGT TCTGGCACCA 1800TCGTCGATGC GCGTATCGTT CTTGAAGCCT ACGGCCTCAC GGAAGACGAT ATCAAGGCTG 1860AACACCTGAA GCCGGGACCG GCAGGCGAGA GGCTGAAAGA TGGTGCGCTG GACGCCTATT 1920TCTTTGTGGG CGGCTATCCG ACGGGCGCAA TCTCGGAACT GGCCATCTCG AACGGTATTT 1980CGCTCGTTCC GATCTCCGGG CCGGAAGCGG ACAAGATTCT GGAGAAATAT TCCTTCTTCT 2040CGAAGGATGT GGTTCCTGCC GGAGCCTATA AGGACGTGGC GGAAACACCG ACCCTTGCCG 2100TTGCCGCACA GTGGGTGACG AGCGCCAAGC AGCCGGACGA CCTCATCTAT AACATCACCA 2160AGGCTGGTTC TCCGAAACCG GGTGCTGGTA GATCTAAGCT TCCCGGGGAT CCTAGCTAGC 2220TAGCCATGGC ATCACAGTAT CGTGATGACA GAGGCAGGGA GTGGGACAAA ATTGAAATCA 2280AATAATGATT TTATTTTGAC TGATAGTGAC CTGTTCGTTG CAACAAATTG ATAAGCAATG 2340CTTTTTTATA ATGCCAACTT AGTATAAAAA AGCTGAACGA GAAACGTAAA ATGATATAAA 2400TATCAATATA TTAAATTAGA TTTTGCATAA AAAACAGACT ACATAATACT GTAAAACACA 2460ACATATGCAG TCACTATGAA TCAACTACTT AGATGGTATT AGTGACCTGT AACAGAGCAT 2520TAGCGCAAGG TGATTTTTGT CTTCTTGCGC TAATTTTTTG TCATCAAACC TGTCGCACTC 2580CAGAGAAGCA CAAAGCCTCG CAATCCAGTG CAAAGCTCTG CCTCGCGCGT TTCGGTGATG 2640ACGGTGAAAA CCTCTGACAC ATGCAGCTCC CGGAGACGGT CACAGCTTGT CTGTAAGCGG 2700ATGCCGGGAG CAGACAAGCC CGTCAGGGCG CGTCAGCGGG TGTTGGCGGG TGTCGGGGCG 2760CAGCCATGAC CCAGTCACGT AGCGATAGCG GAGTGTATAC TGGCTTAACT ATGCGGCATC 2820AGAGCAGATT GTACTGAGAG TGCACCATAT GCGGTGTGAA ATACCGCACA GATGCGTAAG 2880GAGAAAATAC CGCATCAGGC GCTCTTCCGC TTCCTCGCTC ACTGACTCGC TGCGCTCGGT 2940CGTTCGGCTG CGGCGAGCGG TATCAGCTCA CTCAAAGGCG GTAATACGGT TATCCACAGA 3000ATCAGGGGAT AACGCAGGAA AGAACATGTG AGCAAAAGGC CAGCAAAAGG CCAGGAACCG 3060TAAAAAGGCC GCGTTGCTGG CGTTTTTCCA TAGGCTCCGC CCCCCTGACG AGCATCACAA 3120AAATCGACGC TCAAGTCAGA GGTGGCGAAA CCCGACAGGA CTATAAAGAT ACCAGGCGTT 3180TCCCCCTGGA AGCTCCCTCG TGCGCTCTCC TGTTCCGACC CTGCCGCTTA CCGGATACCT 3240GTCCGCCTTT CTCCCTTCGG GAAGCGTGGC GCTTTCTCAT AGCTCACGCT GTAGGTATCT 3300CAGTTCGGTG TAGGTCGTTC GCTCCAAGCT GGGCTGTGTG CACGAACCCC CCGTTCAGCC 3360CGACCGCTGC GCCTTATCCG GTAACTATCG TCTTGAGTCC AACCCGGTAA GACACGACTT 3420ATCGCCACTG GCAGCAGCCA CTGGTAACAG GATTAGCAGA GCGAGGTATG TAGGCGGTGC 3480TACAGAGTTC TTGAAGTGGT GGCCTAACTA CGGCTACACT AGAAGGACAG TATTTGGTAT 3540CTGCGCTCTG CTGAAGCCAG TTACCTTCGG AAAAAGAGTT GGTAGCTCTT GATCCGGCAA 3600ACAAACCACC GCTGGTAGCG GTGGTTTTTT TGTTTGCAAG CAGCAGATTA CGCGCAGAAA 3660AAAAGGATCT CAAGAAGATC CTTTGATCTT TTCTACGGGG TCTGACGCTC AGTGGAACGA 3720AAACTCACGT TAAGGGATTT TGGTCATGAG ATTATCAAAA AGGATCTTCA CCTAGATCCT 3780TTTAAATTAA AAATGAAGTT TTAAATCAAT CTAAAGTATA TATGAGTAAA CTTGGTCTGA 3840CAGTTACCAA TGCTTAATCA GTGAGGCACC TATCTCAGCG ATCTGTCTAT TTCGTTCATC 3900CATAGTTGCC TGACTCCCCG TCGTGTAGAT AACTACGATA CGGGAGGGCT TACCATCTGG 3960CCCCAGTGCT GCAATGATAC CGCGAGACCC ACGCTCACCG GCTCCAGATT TATCAGCAAT 4020AAACCAGCCA GCCGGAAGGG CCGAGCGCAG AAGTGGTCCT GCAACTTTAT CCGCCTCCAT 4080CCAGTCTATT AATTGTTGCC GGGAAGCTAG AGTAAGTAGT TCGCCAGTTA ATAGTTTGCG 4140CAACGTTGTT GCCATTGCTG CAGGCATCGT GGTGTCACGC TCGTCGTTTG GTATGGCTTC 4200ATTCAGCTCC GGTTCCCAAC GATCAAGGCG AGTTACATGA TCCCCCATGT TGTGCAAAAA 4260AGCGGTTAGC TCCTTCGGTC CTCCGATCGT TGTCAGAAGT AAGTTGGCCG CAGTGTTATC 4320ACTCATGGTT ATGGCAGCAC TGCATAATTC TCTTACTGTC ATGCCATCCG TAAGATGCTT 4380TTCTGTGACT GGTGAGTACT CAACCAAGTC ATTCTGAGAA TAGTGTATGC GGCGACCGAG 4440TTGCTCTTGC CCGGCGTCAA CACGGGATAA TACCGCGCCA CATAGCAGAA CTTTAAAAGT 4500GCTCATCATT GGAAAACGTT CTTCGGGGCG AAAACTCTCA AGGATCTTAC CGCTGTTGAG 4560ATCCAGTTCG ATGTAACCCA CTCGTGCACC CAACTGATCT TCAGCATCTT TTACTTTCAC 4620CAGCGTTTCT GGGTGAGCAA AAACAGGAAG GCAAAATGCC GCAAAAAAGG GAATAAGGGC 4680GACACGGAAA TGTTGAATAC TCATACTCTT CCTTTTTCAA TATTATTGAA GCATTTATCA 4740GGGTTATTGT CTCATGAGCG GATACATATT TGAATGTATT TAGAAAAATA AACAAATAGG 4800GGTTCCGCGC ACATTTCCCC GAAAAGTGCC ACCTGACGTC TAAGAAACCA TTATTATCAT 4860GACATTAACC TATAAAAATA GGCGTATCAC GAGGCCCTTT CGTCTTCAA 4909 9144 basepairs nucleic acid single circular DNA (genomic) not provided 79AATTCGAGCT CGGTACCCAT CGAATTCCTT CAGGAAAAGA ACGATGGCTG TCTTATTAGC 60GGTTGCAGGC ACATTTATTT TGGTCACACA CGGGAATGTC GGCAGCCTGT CTATATCCGG 120TCTGGCTGTT TTTTGGGGCA TCAGCTCGGC ATTTGCGCTG GCGTTTTACA CCCTCCAGCC 180GCATCGGCTT TTGAAGAAAT GGGGCTCCGC CATTATTGTC GGATGGGGCA TGCTGATGCG 240GAGCCGTTCT CAGCCTGATT CAGCCGCCTT GGAAGTTTGA AGGCCAATGG TCGTTGTCCG 300CATATGCCGC GATCGTGTTT ATCATCATTT TCGGAACGCT CATCGCTTTT TATTGCTATT 360TGGAAAGCCT GAAATATCTG AGTGCCTCTG AAACCAGCCT CCTCGCCTGT GCAGAGCCGC 420TGTCAGCAGC TTTTTTAGCG GTGATCTGGC TGCATGTTCC CTTCGGAATA TCAGAATGGC 480TGGGTACTTT ACTGATTTTA GCCACCATCG CTTATTATCT ATCAAGAAAA AATAACCTCT 540CTTTTTTTAG AGAGGTTTTT CCCTAGGCCT GAAGCACCCT TTAGTCTCAA TTACCCATAA 600ATTAAAAGGC CTTTTTTCGT TTTACTATCA TTCAAAAGAG GAAAATAGAC CAGTTGTCAA 660TAGAATCAGA GTCTAATAGA ATGAGGTCGA AAAGTAAATC ACGCAGGATT GTTACTGATA 720AAGCAGGCAA GACCTAAAAT GTGTTAAGGG CAAAGTGTAT TCTTTGGCGT CATCCCTTAC 780ATATTTTGGG TCTTTTTTTC TGTAACAAAC CTGCCATCCA TGAATTCGGG AGGATCGAAA 840CGGCAGATCG CAAAAACAGT ACATACAGAA GGAGACATGA ACATGAACAT CAAAAAAATT 900GTAAAACAAG CCACAGTACT GACTTTTACG ACTGCACTGC TAGCAGGAGG AGCGACTCAA 960GCCTTCGCGA AAGAAGATAT CGATCAACGC AATGGTTTTA TCCAAAGCCT TAAAGATGAT 1020CCAAGCCAAA GTGCTAACGT TTTAGGTGAA GCTCAAAAAC TTAATGACTC TCAAGCTCCA 1080AAAGCTGATG CGCAACAAAA TAACTTCAAC AAAGATCAAC AAAGCGCCTT CTATGAAATC 1140TTGAACATGC CTAACTTAAA CGAAGCGCAA CGTAACGGCT TCATTCAAAG TCTTAAAGAC 1200GACCCAAGCC AAAGCACTAA CGTTTTAGGT GAAGCTAAAA AATTAAACGA ATCTCAAGCA 1260CCGAAAGCTG ATAACAATTT CAACAAAGAA CAACAAAATG CTTTCTATGA AATCTTGAAT 1320ATGCCTAACT TAAACGAAGA ACAACGCAAT GGTTTCATCC AAAGCTTAAA AGATGACCCA 1380AGCCAAAGTG CTAACCTATT GTCAGAAGCT AAAAAGTTAA ATGAATCTCA AGCACCGAAA 1440GCGGATAACA AATTCAACAA AGAACAACAA AATGCTTTCT ATGAAATCTT ACATTTACCT 1500AACTTAAACG AAGAACAACG CAATGGTTTC ATCCAAAGCC TAAAAGATGA CCCAAGCCAA 1560AGCGCTAACC TTTTAGCAGA AGCTAAAAAG CTAAATGATG CTCAAGCACC AAAAGCTGAC 1620AACAAATTCA ACAAAGAACA ACAAAATGCT TTCTATGAAA TTTTACATTT ACCTAACTTA 1680ACTGAAGAAC AACGTAACGG CTTCATCCAA AGCCTTAAAG ACGATCCGGG GAATTCCCGG 1740GGATCCGTCG ACCTGCAGGC ATGCAAGCTT ACTCCCCATC CCCTCCAGTA ATGACCTCAG 1800AACTCCATCT GGATTTGTTC AGAACGCTCG GTTGCCGCCG GGCGTTTTTT ATTGGTGAGA 1860ATCGCAGCAA CTTGTCGCGC CAATCGAGCC ATGTCGTCGT CAACGACCCC CCATTCAAGA 1920ACAGCAAGCA GCATTGAGAA CTTTGGAATC CAGTCCCTCT TCCACCTGCT GAGGGCAATA 1980AGGGCTGCAC GCGCACTTTT ATCCGCCTCT GCTGCGCTCC GCCACCGTAG TTAAATTTAT 2040GGTTGGTTAT GAAATGCTGG CAGAGACCCA GCGAGACCTG ACCGCAGAAC AGGCAGCAGA 2100GCGTTTGCGC GCAGTCAGCG ATACCCCGGT TGATAATCAG AAAAGCCCCA AAAACAGGAA 2160GATTGTATAA GCAAATATTT AAATTGTAAA CGTTAATATT TTGTTAAAAT TCGCGTTAAA 2220TTTTTGTTAA ATCAGCTCAT TTTTTAACCA ATAGGCCGAA ATCGGCAAAA TCCCTTATAA 2280ATCAAAAGAA TAGCCCGAGA TAGGGTTGAG TGTTGTTCCA GTTTGGAACA AGAGTCCACT 2340ATTAAAGAAC GTGGACTCCA ACGTCAAAGG GCGAAAAACC GTCTATCAGG GCGATGGCCC 2400ACTACGTGAA CCATCACCCA AATCAAGTTT TTTGGGGTCG AGGTGCCGTA AAGCACTAAA 2460TCGGAACCCT AAAGGGAGCC CCCGATTTAG AGCTTGACGG GGAAAGCCGG CGAACGTGGC 2520GAGAAAGGAA GGGAAGAAAG CGAAAGGAGC GGGCGCTAGG GCGCGAGCAA GTGTAGCGGT 2580CACGCGCGCG TAACCACCAC ACCCGCCGCG CTTAATGCGC CGCTACAGGG CGCGTATCCA 2640TTTTCGCGAA TCCGGAGTGT AAGAAATGAG TCTGAAAGAA AAAACACAAT CTCTGTTTGC 2700CAACGCATTT GGCTACCCTG CCACTCACAC CATTCAGGTG CGTCATATAC TGACTGAAAA 2760CGCCCGCACC GTTGAAGCTG CCAGCGCGCT GGAGCAAGGC GACCTGAAAC GTATGGGCGA 2820GTTGATGGCG GAGTCTCATG CCTCTATGCG CGATGATTTC GAAATCACCG TGCCGCAAAT 2880TGACACTCTG GTAGAAATCG TCAAAGCTGT GATTGGCGAC AAAGGTGGCG TACGCATGAC 2940CGGCGGCGGA TTTGGCGGCT GTATCGTCGC GCGTATCCCG GAAGAGCTGG TGCCTGCCGC 3000ACAGCAAGCT GTCGCTGAAC AATATGAAGC AAAAACAGGT ATTAAAGAGA CTTTTTACGT 3060TTGTAAACCA TCACAAGGAG CAGGACAGTG CTGAACGAAA CTCCCGCACT GGCACCCGAT 3120GGCAGCCGTA CCGACTGTTC TGCCTCGCGC GTTTCGGTGA TGACGGTGAA AACCTCTGAC 3180ACATGCAGCT CCCGGAGACG GTCACAGCTT GTCTGTAAGC GGATGCCGGG AGCAGACAAG 3240CCCGTCAGGG CGCGTCAGCG GGTGTTGGCG GGTGTCGGGG CGCAGCCATG ACCCAGTCAC 3300GTAGCGATAG CGGAGTGTAT ACTGGCTTAA CTATGCGGCA TCAGAGCAGA TTGTACTGAG 3360AGTGCACCAT ATGCGGTGTG AAATACCGCA CAGATGCGTA AGGAGAAAAT ACCGCATCAG 3420GCGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG GTCGTTCGGC TGCGGCGAGC 3480GGTATCAGCT CACTCAAAGG CGGTAATACG GTTATCCACA GAATCAGGGG ATAACGCAGG 3540AAAGAACATG TGAGCAAAAG GCCAGCAAAA GGCCAGGAAC CGTAAAAAGG CCGCGTTGCT 3600GGCGTTTTTC CATAGGCTCC GCCCCCCTGA CGAGCATCAC AAAAATCGAC GCTCAAGTCA 3660GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCAGGCG TTTCCCCCTG GAAGCTCCCT 3720CGTGCGCTCT CCTGTTCCGA CCCTGCCGCT TACCGGATAC CTGTCCGCCT TTCTCCCTTC 3780GGGAAGCGTG GCGCTTTCTC ATAGCTCACG CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT 3840TCGCTCCAAG CTGGGCTGTG TGCACGAACC CCCCGTTCAG CCCGACCGCT GCGCCTTATC 3900CGGTAACTAT CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC TGGCAGCAGC 3960CACTGGTAAC AGGATTAGCA GAGCGAGGTA TGTAGGCGGT GCTACAGAGT TCTTGAAGTG 4020GTGGCCTAAC TACGGCTACA CTAGAAGGAC AGTATTTGGT ATCTGCGCTC TGCTGAAGCC 4080AGTTACCTTC GGAAAAAGAG TTGGTAGCTC TTGATCCGGC AAACAAACCA CCGCTGGTAG 4140CGGTGGTTTT TTTGTTTGCA AGCAGCAGAT TACGCGCAGA AAAAAAGGAT CTCAAGAAGA 4200TCCTTTGATC TTTTCTACGG GGTCTGACGC TCAGTGGAAC GAAAACTCAC GTTAAGGGAT 4260TTTGGTCATG AGATTATCAA AAAGGATCTT CACCTAGATC CTTTTAAATT AAAAATGAAG 4320TTTTAAATCA ATCTAAAGTA TATATGAGTA AACTTGGTCT GACAGTTACC AATGCTTAAT 4380CAGTGAGGCA CCTATCTCAG CGATCTGTCT ATTTCGTTCA TCCATAGTTG CCTGACTCCC 4440CGTCGTGTAG ATAACTACGA TACGGGAGGG CTTACCATCT GGCCCCAGTG CTGCAATGAT 4500ACCGCGAGAC CCACGCTCAC CGGCTCCAGA TTTATCAGCA ATAAACCAGC CAGCCGGAAG 4560GGCCGAGCGC AGAAGTGGTC CTGCAACTTT ATCCGCCTCC ATCCAGTCTA TTAATTGTTG 4620CCGGGAAGCT AGAGTAAGTA GTTCGCCAGT TAATAGTTTG CGCAACGTTG TTGCCATTGC 4680TACAGGCATC GTGGTGTCAC GCTCGTCGTT TGGTATGGCT TCATTCAGCT CCGGTTCCCA 4740ACGATCAAGG CGAGTTACAT GATCCCCCAT GTTGTGCAAA AAAGCGGTTA GCTCCTTCGG 4800TCCTCCGATC GTTGTCAGAA GTAAGTTGGC CGCAGTGTTA TCACTCATGG TTATGGCAGC 4860ACTGCATAAT TCTCTTACTG TCATGCCATC CGTAAGATGC TTTTCTGTGA CTGGTGAGTA 4920CTCAACCAAG TCATTCTGAG AATAGTGTAT GCGGCGACCG AGTTGCTCTT GCCCGGCGTC 4980AACACGGGAT AATACCGCGC CACATAGCAG AACTTTAAAA GTGCTCATCA TTGGAAAACG 5040TTCTTCGGGG CGAAAACTCT CAAGGATCTT ACCGCTGTTG AGATCCAGTT CGATGTAACC 5100CACTCGTGCA CCCAACTGAT CTTCAGCATC TTTTACTTTC ACCAGCGTTT CTGGGTGAGC 5160AAAAACAGGA AGGCAAAATG CCGCAAAAAA GGGAATAAGG GCGACACGGA AATGTTGAAT 5220ACTCATACTC TTCCTTTTTC AATATTATTG AAGCATTTAT CAGGGTTATT GTCTCATGAG 5280CGGATACATA TTTGAATGTA TTTAGAAAAA TAAACAAATA GGGGTTCCGC GCACATTTCC 5340CCGAAAAGTG CCACCTGACG TCTAAGAAAC CATTATTATC ATGACATTAA CCTATAAAAA 5400TAGGCGTATC ACGAGGCCCT TTCGTCTTCA AGCCCGAGGT AACAAAAAAA CAACAGCATA 5460AATAACCCCG CTCTTACACA TTCCAGCCCT GAAAAAGGGC ATCAAATTAA ACCACACCTA 5520TGGTGTATGC ATTTATTTGC ATACATTCAA TCAATTGTTA TCTAAGGAAA TACTTACATA 5580TGGTTCGTGC AAACAAACGC AACGAGGCTC TACGAATCGA TGCATGCAGC TGATTTCACT 5640TTTTGCATTC TACAAACTGC ATAACTCATA TGTAAATCGC TCCTTTTTAG GTGGCACAAA 5700TGTGAGGCAT TTTCGCTCTT TCCGGCAACC ACTTCCAAGT AAAGTATAAC ACACTATACT 5760TTATATTCAT AAAGTGTGTG CTCTGCGAGG CTGTCGGCAG TGCCGACCAA AACCATAAAA 5820CCTTTAAGAC CTTTCTTTTT TTTACGAGAA AAAAGAAACA AAAAAACCTG CCCTCTGCCA 5880CCTCAGCAAA GGGGGGTTTT GCTCTCGTGC TCGTTTAAAA ATCAGCAAGG GACAGGTAGT 5940ATTTTTTGAG AAGATCACTC AAAAAATCTC CACCTTTAAA CCCTTGCCAA TTTTTATTTT 6000GTCCGTTTTG TCTAGCTTAC CGAAAGCCAG ACTCAGCAAG AATAAAATTT TTATTGTCTT 6060TCGGTTTTCT AGTGTAACGG ACAAAACCAC TCAAAATAAA AAAGATACAA GAGAGGTCTC 6120TCGTATCTTT TATTCAGCAA TCGCGCCCGA TTGCTGAACA GATTAATAAT AGATTTTAGC 6180TTTTTATTTG TTGAAAAAAG CTAATCAAAT TGTTGTCGGG ATCAATTACT GCAAAGTCTC 6240GTTCATCCCA CCACTGATCT TTTAATGATG TATTGGGGTG CAAAATGCCC AAAGGCTTAA 6300TATGTTGATA TAATTCATCA ATTCCCTCTA CTTCAATGCG GCAACTAGCA GTACCAGCAA 6360TAAACGACTC CGCACCTGTA CAAACCGGTG AATCATTACT ACGAGAGCGC CAGCCTTCAT 6420CACTTGCCTC CCATAGATGA ATCCGAACCT CATTACACAT TAGAACTGCG AATCCATCTT 6480CATGGTGAAC CAAAGTGAAA CCTAGTTTAT CGCAATAAAA ACCTATACTC TTTTTAATAT 6540CCCCGACTGG CAATGCCGGG ATAGACTGTA ACATTCTCAC GCATAAAATC CCCTTTCATT 6600TTCTAATGTA AATCTATTAC CTTATTATTA ATTCAATTCG CTCATAATTA ATCCTTTTTC 6660TTATTACGCA AAATGGCCCG ATTTAAGCAC ACCCTTTATT CCGTTAATGC GCCATGACAG 6720CCATGATAAT TACTAATACT AGGAGAAGTT AATAAATACG TAACCAACAT GATTAACAAT 6780TATTAGAGGT CATCGTTCAA AATGGTATGC GTTTTGACAC ATCCACTATA TATCCGTGTC 6840GTTCTGTCCA CTCCTGAATC CCATTCCAGA AATTCTCTAG CGATTCCAGA AGTTTCTCAG 6900AGTCGGAAAG TTGACCAGAC ATTACGAACT GGCACAGATG GTCATAACCT GAAGGAAGAT 6960CTGATTGCTT AACTGCTTCA GTTAAGACCG AAGCGCTCGT CGTATAACAG ATGCGATGAT 7020GCAGACCAAT CAACATGGCA CCTGCCATTG CTACCTGTAC AGTCAAGGAT GGTAGAAATG 7080TTGTCGGTCC TTGCACACGA ATATTACGCC ATTTGCCTGC ATATTCAAAC AGCTCTTCTA 7140CGATAAGGGC ACAAATCGCA TCGTGGAACG TTTGGGCTTC TACCGATTTA GCAGTTTGAT 7200ACACTTTCTC TAAGTATCCA CCTGAATCAT AAATCGGCAA AATAGAGAAA AATTGACCAT 7260GTGTAAGCGG CCAATCTGAT TCCACCTGAG ATGCATAATC TAGTAGAATC TCTTCGCTAT 7320CAAAATTCAC TTCCACCTTC CACTCACCGG TTGTCCATTC ATGGCTGAAC TCTGCTTCCT 7380CTGTTGACAT GACACACATC ATCTCAATAT CCGAATAGGG CCCATCAGTC TGACGACCAA 7440GAGAGCCATA AACACCAATA GCCTTAACAT CATCCCCATA TTTATCCAAT ATTCGTTCCT 7500TAATTTCATG AACAATCTTC ATTCTTTCTT CTCTAGTCAT TATTATTGGT CCATTCACTA 7560TTCTCATTCC CTTTTCAGAT AATTTTAGAT TTGCTTTTCT AAATAAGAAT ATTTGGAGAG 7620CACCGTTCTT ATTCAGCTAT TAATAACTCG TCTTCCTAAG CATCCTTCAA TCCTTTTAAT 7680AACAATTATA GCATCTAATC TTCAACAAAC TGGCCCGTTT GTTGAACTAC TCTTTAATAA 7740AATAATTTTT CCGTTCCCAA TTCCACATTG CAATAATAGA AAATCCATCT TCATCGGCTT 7800TTTCGTCATC ATCTGTATGA ATCAAATCGC CTTCTTCTGT GTCATCAAGG TTTAATTTTT 7860TATGTATTTC TTTTAACAAA CCACCATAGG AGATTAACCT TTTACGGTGT AAACCTTCCT 7920CCAAATCAGA CAAACGTTTC AAATTCTTTT CTTCATCATC GGTCATAAAA TCCGTATCCT 7980TTACAGGATA TTTTGCAGTT TCGTCAATTG CCGATTGTAT ATCCGATTTA TATTTATTTT 8040TCGGTCGAAT CATTTGAACT TTTACATTTG GATCATAGTC TAATTTCATT GCCTTTTTCC 8100AAAATTGAAT CCATTGTTTT TGATTCACGT AGTTTTCTGT ATTCTTAAAA TAAGTTGGTT 8160CCACACATAC CAATACATGC ATGTGCTGAT TATAAGAATT ATCTTTATTA TTTATTGTCA 8220CTTCCGTTGC ACGCATAAAA CCAACAAGAT TTTTATTAAT TTTTTTATAT TGCATCATTC 8280GGCGAAATCC TTGAGCCATA TCTGACAAAC TCTTATTTAA TTCTTCGCCA TCATAAACAT 8340TTTTAACTGT TAATGTGAGA AACAACCAAC GAACTGTTGG CTTTTGTTTA ATAACTTCAG 8400CAACAACCTT TTGTGACTGA ATGCCATGTT TCATTGCTCT CCTCCAGTTG CACATTGGAC 8460AAAGCCTGGA TTTACAAAAC CACACTCGAT ACAACTTTCT TTCGCCTGTT TCACGATTTT 8520GTTTATACTC TAATATTTCA GCACAATCTT TTACTCTTTC AGCCTTTTTA AATTCAAGAA 8580TATGCAGAAG TTCAAAGTAA TCAACATTAG CGATTTTCTT TTCTCTCCAT GGTCTCACTT 8640TTCCACTTTT TGTCTTGTCC ACTAAAACCC TTGATTTTTC ATCTGAATAA ATGCTACTAT 8700TAGGACACAT AATATTAAAA GAAACCCCCA TCTATTTAGT TATTTGTTTA GTCACTTATA 8760ACTTTAACAG ATGGGGTTTT TCTGTGCAAC CAATTTTAAG GGTTTTCAAT ACTTTAAAAC 8820ACATACATAC CAACACTTCA ACGCACCTTT CAGCAACTAA AATAAAAATG ACGTTATTTC 8880TATATGTATC AAGATAAGAA AGAACAAGTT CAAAACCATC AAAAAAAGAC ACCTTTTCAG 8940GTGCTTTTTT TATTTTATAA ACTCATTCCC TGATCTCGAC TTCGTTCTTT TTTTACCTCT 9000CGGTTATGAG TTAGTTCAAA TTCGTTCTTT TTAGGTTCTA AATCGTGTTT TTCTTGGAAT 9060TGTGCTGTTT TATCCTTTAC CTTGTCTACA AACCCCTTAA AAACGTTTTT AAAGGCTTTT 9120AAGCCGTCTG TACGTTCCTT AAGG 9144 303 base pairs nucleic acid singlelinear DNA (genomic) not provided 80 GGGCCGGTCG AGGTGGACAA GGTGCAGGTGCAGCCGCTGC TGCTGCGGGC GGCGCAGGTC 60 AAGGTGGGTA TGGGGGTTTA GGTTCACAAGGGGCCGGACG TGGTGGCCTT GGTGGTCAGG 120 GTGCTGGCGC GGCAGCCGCT GCGGCAGCTGGTGGTGCTGG TCAGGGCGGT CTTGGCTCAC 180 AAGGGGCCGG TCAAGGCGCT GGTGCAGCAGCAGCTGCCGC TGGCGGTGCA GGCCAAGGTG 240 GATATGGTGG CTTAGGGTCA CAAGGGGCCGGGCAAGGTGG TTACGGCGGT CTCGGATCAC 300 AAG 303 303 base pairs nucleic acidsingle linear DNA (genomic) not provided 81 GGGCCGGGCA AGGTGGTTACGGCGGTCTCG GATCACAAGG GGCCGGACGT GGTGGCCTTG 60 GTGGTCAGGG TGCTGGCGCGGCAGCCGCTG CGGCAGCTGG TGGTGCTGGT CAGGGCGGTC 120 TTGGCTCACA AGGGGCCGGTCAAGGCGCTG GTGCAGCAGC AGCTGCCGCT GGCGGTGCAG 180 GCCAAGGTGG ATATGGTGGCTTAGGGTCAC AAGGGGCCGG TCGAGGTGGA CAAGGTGCAG 240 GTGCAGCCGC TGCTGCTGCGGGCGGCGCAG GTCAAGGTGG GTATGGGGGT TTAGGTTCAC 300 AAG 303 303 base pairsnucleic acid single linear DNA (genomic) not provided 82 TCTCAGGGTGCTGGCCAGGG TGGCTATGGT GGCCTGGGAT CTCAAGGCGC TGGTCGCGGT 60 GGCCTGGGTGGCCAGGGTGC AGGTGCTGCT GCTGCTGCGG CTGCTGGTGG TGCAGGTCAG 120 GGTGGTCTGGGATCTCAGGG CGCAGGTCAA GGTGCTGGTG CAGCTGCGGC GGCAGCTGGT 180 GGCGCGGGTCAAGGTGGCTA CGGCGGTTTA GGATCTCAAG GTGCGGGTCG CGGTGGTCAG 240 GGCGCTGGTGCAGCAGCGGC AGCAGCAGGT GGCGCTGGCC AAGGTGGTTA CGGTGGTCTT 300 GGA 303 357base pairs nucleic acid single linear DNA (genomic) not provided 83GGGCCATCCG GCCCAGGTTC TGCGGCAGCG GCAGCAGCGG GCCCAGGGCA GCAGGGGCCG 60GGCGGTTACG GTCCGGGTCA GCAAGGCCCA GGTGGCTACG GCCCAGGCCA ACAGGGGCCA 120TCTGGTCCGG GTAGCGCTGC GGCTGCTGCT GCTGCGGCAG GTCCAGGCGG CTACGGGCCG 180GGCCAACAAG GTCCGGGCGG CTATGGTCCA GGTCAACAGG GGCCGAGCGG TCCAGGTTCC 240GCAGCAGCAG CGGCTGCGGC GGCAGCGGGT CCAGGTGGTT ACGGGCCAGG CCAGCAGGGT 300CCGGGTGGCT ATGGCCCAGG CCAGCAAGGT CCGGGTGGTT ACGGTCCAGG TCAGCAG 357 39base pairs nucleic acid single linear DNA (genomic) not provided 84GATCTCAAGG AGCCGGTCAA GGTGGTTACG GAGGTCTGG 39 39 base pairs nucleic acidsingle linear DNA (genomic) not provided 85 GATCCCAGAC CTCCGTAACCACCTTGACCG GCTCCTTGA 39 13 amino acids amino acid unknown unknownpeptide not provided 86 Ser Gln Gly Ala Gly Gln Gly Gly Tyr Gly Gly LeuGly 1 5 10 93 base pairs nucleic acid single linear DNA (genomic) notprovided 87 GATCTCAAGG TGCTGGACGT GGTGGTCTTG GTGGTCAGGG TGCCGGTGCCGCCGCTGCCG 60 CCGCCGCTGG TGGTGCTGGA CAAGGTGGTT TGG 93 93 base pairsnucleic acid single linear DNA (genomic) not provided 88 GATCCCAAACCACCTTGTCC AGCACCACCA GCGGCGGCGG CAGCGGCGGC ACCGGCACCC 60 TGACCACCAAGACCACCACG TCCAGCACCT TGA 93 31 amino acids amino acid unknown unknownpeptide not provided 89 Ser Gln Gly Ala Gly Arg Gly Gly Leu Gly Gly GlnGly Ala Gly Ala 1 5 10 15 Ala Ala Ala Ala Ala Ala Gly Gly Ala Gly GlnGly Gly Leu Gly 20 25 30 81 base pairs nucleic acid single linear DNA(genomic) not provided 90 GATCTCAGGG AGCTGGTCAA GGTGCCGGTG CTGCTGCCGCTGCTGCCGGA GGTGCCGGTC 60 AGGGTGGATA CGGTGGACTT G 81 81 base pairsnucleic acid single linear DNA (genomic) not provided 91 GATCCAAGTCCACCGTATCC ACCCTGACCG GCACCTCCGG CAGCAGCGGC AGCAGCACCG 60 GCACCTTGACCAGCTCCCTG A 81 27 amino acids amino acid unknown unknown peptide notprovided 92 Ser Gln Gly Ala Gly Gln Gly Ala Gly Ala Ala Ala Ala Ala AlaGly 1 5 10 15 Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly 20 25 90 basepairs nucleic acid single linear DNA (genomic) not provided 93GATCTCAGGG TGCTGGTAGA GGTGGACAAG GTGCCGGAGC TGCCGCTGCC GCTGCCGGTG 60GTGCTGGTCA AGGAGGTTAC GGTGGTCTTG 90 90 base pairs nucleic acid singlelinear DNA (genomic) not provided 94 GATCCAAGAC CACCGTAACC TCCTTGACCAGCACCACCGG CAGCGGCAGC GGCAGCTCCG 60 GCACCTTGTC CACCTCTACC AGCACCCTGA 9030 amino acids amino acid unknown unknown peptide not provided 95 SerGln Gly Ala Gly Arg Gly Gly Gln Gly Ala Gly Ala Ala Ala Ala 1 5 10 15Ala Ala Gly Gly Ala Gly Gln Gly Gly Tyr Gly Gly Leu Gly 20 25 30 588base pairs nucleic acid single linear DNA (genomic) not provided 96ATGCATTGTC TCCACATTGT ATGCTTCCAA GATTCTGGTG GGAATACTGC TGATAGCCTA 60ACGTTCATGA TCAAAATTTA ACTGTTCTAA CCCCTACTTG ACAGCAATAT ATAAACAGAA 120GGAAGCTGCC CTGTCTTAAA CCTTTTTTTT TATCATCATT ATTAGCTTAC TTTCATAATT 180GCGACTGGTT CCAATTGACA AGCTTTTGAT TTTAACGACT TTTAACGACA ACTTGAGAAG 240ATCAAAAAAC AACTAATTAT TCGAAACGAT GAGATTTCCT TCAATTTTTA CTGCAGTTTT 300ATTCGCAGCA TCCTCCGCAT TAGCTGCTCC AGTCAACACT ACAACAGAAG ATGAAACGGC 360ACAAATTCCG GCTGAAGCTG TCATCGGTTA CTCAGATTTA GAAGGGGATT TCGATGTTGC 420TGTTTTGCCA TTTTCCAACA GCACAAATAA CGGGTTATTG TTTATAAATA CTACTATTGC 480CAGCATTGCT GCTAAAGAAG AAGGGGTATC TCTCGAGAAA AGAGAGGCTG AAGCTTACGT 540AGAATTCCCT AGGGCGGCCG CGAATTAATT CGCCTTAGAC ATGACTGT 588 93 amino acidsamino acid unknown unknown peptide not provided 97 Met Arg Phe Pro SerIle Phe Thr Ala Val Leu Phe Ala Ala Ser Ser 1 5 10 15 Ala Leu Ala AlaPro Val Asn Thr Thr Thr Glu Asp Glu Thr Ala Gln 20 25 30 Ile Pro Ala GluAla Val Ile Gly Tyr Ser Asp Leu Glu Gly Asp Phe 35 40 45 Asp Val Ala ValLeu Pro Phe Ser Asn Ser Thr Asn Asn Gly Leu Leu 50 55 60 Phe Ile Asn ThrThr Ile Ala Ser Ile Ala Ala Lys Glu Glu Gly Val 65 70 75 80 Ser Leu GluLys Arg Glu Ala Glu Ala Tyr Val Glu Phe 85 90 30 base pairs nucleic acidsingle linear DNA (genomic) not provided 98 CAACTAATTA TTCGAAACGATGAGATTTCC 30 23 base pairs nucleic acid single linear DNA (genomic) notprovided 99 CTGAGGAACA GTCATGTCTA AGG 23 30 base pairs nucleic acidsingle linear DNA (genomic) not provided 100 GGAAATCTCA TCGTTTCGAATAATTAGTTG 30 23 base pairs nucleic acid single linear DNA (genomic) notprovided 101 GAAACGCAAA TGGGGAAACA ACC 23 9 amino acids amino acidunknown unknown peptide not provided 102 Met Gly Ser His His His His HisHis 1 5 32 base pairs nucleic acid single linear DNA (genomic) notprovided 103 AATTATGGGA TCCCATCACC ATCACCATCA CT 32 32 base pairsnucleic acid single linear DNA (genomic) not provided 104 AATTAGTGATGGTGATGGTG ATGGGATCCC AT 32 6 amino acids amino acid unknown unknownpeptide not provided 105 Phe Gly Ser Gln Gly Ala 1 5 23 base pairsnucleic acid single linear DNA (genomic) not provided 106 AATTCGGATCCCAGGGTGCT TAA 23 23 base pairs nucleic acid single linear DNA (genomic)not provided 107 GGCCTTAAGC ACCCTGGGAT CCG 23

I claim:
 1. A nucleic acid molecule having the sequence selected fromthe group consisting of: SEQ ID NO:80, SEQ ID NO:81 SEQ ID NO:82 and SEQID NO:83.
 2. A nucleic acid moleclue encoding a fiber-forming spidersilk variant protein comprising from 1 to 16 tandem repeats of thepolypeptide selected from the group consisting of SEQ ID NO:20, SEQ IDNO:22, SEQ ID NO:60 and SEQ ID NO:62.
 3. A plasmid comprising thenucleic acid molecule of claim 2 operably and expressibly linked to asuitable promoter.
 4. A plasmid as recited in claim 3 wherein thenucleic acid molecule is flanked on either the 5′ end or the 3′ end by aDNA fragment encoding a series of between 4 and 20 histidine residues.5. A transformed host cell comprising the plasmid of claim 3 wherein thenucleic acid molecule is flanked on either the 5′ end or the 3′ end by aDNA fragment encoding a series of between 4 and 20 histidine residues.6. A host cell transformed with a plasmid comprising the nucleic acidmolecule of claim 2, the host cell capable of secreting spider silkvariant protein into the cell growth media.
 7. A nucleic acid moleculecomprising from 1 to 16 tandem repeats of the nucleic acid selected fromthe group consisting of SEQ ID NO:80, SEQ ID NO:81, SEQ ID NO:82, andSEQ ID NO:83.
 8. The transformed E. coli host FP3350 identified by theATCC number ATCC
 69328. 9. The transformed Bacillus subtilis hostFP2193, identified by the ATCC number ATCC
 69327. 10. A universalexpression vector pFP204, useful for the expression of spider silkvariant proteins, identified by the ATCC number ATCC
 69326. 11. A spiderdragline variant protein wherein the full length variant protein isdefined by the formula: [A[C]GQGGYGGLGXQGAGRGGLGGQGAGAnGG]z wherein X=S,G or N; n=0-7 and z=1-75, and wherein the value of z determines thenumber of repeats in the variant protein and wherein the formulaencompasses variations selected from the group consisting of: (a) whenn=0, the sequence encompassing AGRGGLGGQGAGAnGG is deleted; (b)deletions other than the poly-alanine sequence, limited by the value ofn will encompass integral multiples of three consecutive residues; (c)the deletion of GYG in any repeat is accompanied by deletion of GRG inthe same repeat; and (d) where a first repeat where n=0 is deleted, thefirst repeat is preceded by a second repeat where n=6; and wherein thefull-length protein is encoded by a gene or genes and wherein said geneor genes are not endogenous to the Nephila clavipes genome.
 12. A spiderdragline variant protein wherein the full length silk variant protein isdefined by the formula: [GPGGYGPGQQGPGGYGPGQQGPGGYGPGQQGPSGPGSAn]zwherein n=6-10 and z=1-75 and wherein, excluding the poly-alaninesequence, individual repeats differ from the consensus repeat sequenceby deletions of integral multiples of five consecutive residuesconsisting of one or both of the pentapeptide sequences GPGGY or GPGQQand wherein the full-length protein is encoded by a gene or genes andwherein the gene or genes are not endogenous to the Nephila clavipesgenome.
 13. A transformed host cell comprising: A) a host cell selectedfrom the group consisting of E. coli, Bacillus subtilis, Saccharomycescerevisiae, Schizosaccharomyces pombe, Pichia pastoris, Aspergillus sp.,and Streptomyces sp.; and B) a plasmid comprising a nucleic acidmolecule i) selected from the group consisting of a) a nucleic acidmolecule encoding a fiber-forming spider silk variant protein comprisingfrom 1 to 16 tandem repeats of DP-1A.9 (SEQ ID NO:20); b) a nucleic acidmolecule encoding a fiber-forming spider silk variant protein comprisingfrom 1 to 16 tandem repeats of DP-1B.9 (SEQ ID NO:22); c) a nucleic acidmolecule encoding a fiber-forming spider silk variant protein comprisingfrom 1 to 16 tandem repeats of DP-1B.16 (SEQ ID NO:62); and d) A nucleicacid molecule encoding a fiber-forming spider silk variant proteincomprising from 1 to 16 tandem repeats of DP-2A (SEQ ID NO:60) operablyand expressibly linked to a suitable promoter; and ii) optionallyflanked on either the 5′ end or the 3′ end by a DNA fragment encoding aseries of between 4 and 20 histidine residues,wherein the plasmidexpresses a spider silk variant protein at levels between 1 mg and 300mg of full-length protein per gram of dry cell mass.
 14. A polypeptideselected from the group consisting of SEQ ID NO:20, SEQ ID NO:22, SEQ IDNO:60, SEQ ID NO:62.
 15. A polypeptide selected from the groupconsisting of SEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:61, and SEQ IDNO:63.
 16. A spider silk variant protein comprised of from 1 to 16tandem repeats of a polypeptide selected from the group consisting ofSEQ ID NO:21, SEQ ID NO:23, SEQ ID NO:61, and SEQ ID NO:63.